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Preface 


This book grew out of the notes for a course I gave for undergraduate physics 
students at the University of Texas. In this book I think I go farther forward 
than is usual in undergraduate courses, giving readers a taste of nuclear physics 
and quantum field theory. I also go farther back than is usual, starting with the 
struggle in the nineteenth century to establish the existence and properties of 
atoms, including the development of thermodynamics that both aided in this 
struggle and offered an alternative program. 

I fear that some readers may want to skim through this early part and hurry 
on to what they regard as the good stuff, quantum mechanics and relativity. That 
would be a pity. In my experience physics students who aim at a career in atomic 
or nuclear or elementary particle physics often manage to get through their 
formal education without ever becoming familiar with entropy, or equipartition, 
or viscosity, or diffusion. That was true in my own case. This book, or a course 
based on it, may provide some students with their last chance to learn about 
these and other matters needed to understand the macroscopic world. 

Readers may find this book unusual also in its strong emphasis on history. 
I make a point of saying a little about the welter of theoretical guesswork and ill- 
understood experiments out of which modern physics emerged in the twentieth 
century. This, it seems to me, is a help in understanding what otherwise may 
seem an arbitrary set of postulates for relativity and quantum mechanics. It is 
also a matter of personal taste. Research in physics seems to me to lose some of 
its excitement if we do not see it as part of a great historical progression. Some 
valuable historical works are listed in a bibliography, along with collections of 
original articles that I have found most helpful. 

But this is not a work of history. Historians aim at uncovering how the scien- 
tists of the past thought about their own problems — for instance, how Einstein 
in 1905 thought about the measurement of space and time separations in de- 
veloping the special theory of relativity. For this aim of historical writing it is 
necessary to go deeply into personal accounts, institutional development, and 
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false starts, and to put aside our knowledge of subsequent progress. I try to 
be accurate in describing the state of physics in past times, but the aim of this 
book in discussing the problems of the past is different: it is to make clear how 
physicists think about these things today. 

This book is intended chiefly for physics students who are well into their time 
as undergraduates, and for working scientists who want a brief introduction to 
some area of modern physics. I have therefore not hesitated to use calculus and 
matrix algebra, though not in advanced versions. As required by the subject 
matter, the mathematical level here slopes upwards through the book. Where 
possible I have chosen concrete rather than abstract formulations of physical 
theories. For instance, in Chapter 5, on quantum mechanics, I mostly represent 
physical states as wave functions, only coming at the end of the chapter to their 
representation as vectors in Hilbert space. In some sections detailed material 
that can be skipped without losing the thread of the theory is put into appendices. 
Two of these appendices present what in my unbiased opinion are improved 
derivations of important results: the appendix to Section 2.6 gives a revised 
version of Einstein’s derivation of his formula for the diffusion constant in 
Brownian motion, and the appendix to Section 6.4 presents a revision of Fermi’s 
calculation of the rate of alpha decay. 

In my experience, with some judicious pruning, the material of the book up 
to about the middle of Chapter 5 can be covered in a one-term undergraduate 
course. But I think that to go over the whole book would take a full two-term 
academic year. 

This book treats such a broad range of topics that it is impossible to go very 
far into any of them. Certainly its treatment of quantum mechanics, statistical 
mechanics, transport theory, nuclear physics, and quantum field theory is no 
substitute for graduate-level courses on these topics, any one of which would 
occupy at least a whole year. This book presents what I think, in an ideal 
world, the ambitious physics student would already know when he or she enters 
graduate school. At least, it is what I wish that I had known when I entered 
graduate school. 

In any case, I hope that the student or reader may be sufficiently interested in 
what I do discuss that they will want to go into these topics in greater detail in 
more specialized books or courses, and that they will find in this book a good 
preparation for such further studies. 

I am grateful to many students and colleagues for pointing out errors in 
the lecture notes on which this book is based and for the expert and friendly 
assistance I have received from Simon Capelin and Vince Higgs, the editors at 
Cambridge University Press who guided the publication of this book. 


STEVEN WEINBERG 


1 
Early Atomic Theory 


It is an old idea that matter consists of atoms, tiny indivisible particles moving 
in empty space. This theory can be traced to Democritus, working in the Greek 
city of Abdera, on the north shore of the Aegean sea. In the late 400s BC 
Democritus proclaimed that “atoms and void alone exist in reality.’ He offered 
neither evidence for this hypothesis nor calculations on which to base predic- 
tions that could confirm it. Nevertheless, this idea was tremendously influential, 
if only as an example of how it might be possible to account for natural phe- 
nomena without invoking the gods. Atoms were brought into the materialistic 
philosophy of Epicurus of Samos, who a little after 300 BC founded one of 
the four great schools of Athens, the Garden. In turn, the idea of atoms and 
the philosophy of Epicurus were invoked in the poem On the Nature of Things 
by the Roman Lucretius. After this poem was rediscovered in 1417 it influ- 
enced Machiavelli, More, Shakespeare, Montaigne, and Newton, among others. 
Newton in his Opticks speculated that the properties of matter arise from the 
clustering of atoms into larger particles, which themselves cluster into larger 
particles, and so on. As we will see, Newton made a stab at an atomic theory of 
air pressure, but without significant success. 

The serious scientific application of the atomic theory began in the eighteenth 
century, with calculations of the properties of gases, which had been studied 
experimentally since the century before. This is the topic with which we begin 
this chapter. Applications to chemistry and electrolysis followed in the nine- 
teenth century and will be considered in subsequent sections. The final section 
of this chapter describes how the nature of atoms began to be clarified with the 
discovery of the electron. In the following chapter we will see how it became 
possible to estimate the atoms’ masses and sizes.! 


! Further historical details about some of these matters can be found in Weinberg, The Discovery of 
Subatomic Particles, listed in the bibliography. 


2 1 Early Atomic Theory 
1.1 Gas Properties 


Experimental Relations 


The upsurge of enthusiasm for experiment in the seventeenth century was 
largely concentrated on the properties of air. The execution and reports of these 
experiments did not depend on hypotheses regarding atoms, but we need to 
recall them here because their results provided the background for later theories 
of gas properties that did rely on assumptions about atoms. 

It had been thought by Aristotle and his followers that the suction observed 
in pumps and bellows arises from nature’s abhorrence of a vacuum. This notion 
was challenged in the 1640s by the invention of the barometer by the Florentine 
polymath Evangelista Torricelli (1608-1647). If nature abhors a vacuum, then 
when a long glass tube with one end closed is filled with mercury and set 
upright with the closed end on top, why does the mercury flow out of the bottom 
until the column is only 760 mm high, with empty space appearing above the 
mercury? Is there a limit to how much nature abhors a vacuum? Torricelli 
argued that the mercury is held up instead by the pressure of the air acting 
on the open end of the glass tube (or on the surface of a bath of mercury in 
which the open end of the tube is immersed), which is just sufficient to support 
a column of mercury 760 mm high. If so, then it should be possible to measure 
variations in air pressure using a column of mercury in a vertical glass tube, a 
device that we know as a barometer. Such measurements were made from 1648 
to 1651 by Blaise Pascal (1623-1662), who found that the height of mercury 
in a barometer is decreased by moving to the top of a mountain, where less air 
extends above the barometer. 

The quantitative properties of air pressure soon began to be studied 
experimentally, before there was any correct theoretical understanding of gas 
properties. In 1662, in the second edition of his book New Experiments Physico- 
Mechanical Concerning the Spring of the Air and its Effects, the Anglo-Irish 
aristocrat Robert Boyle (1627-1691) described experiments relating the pres- 
sure (the “spring of the air’) and volume of a fixed mass of air. He studied a 
sample of air enclosed at the end of a glass tube by a column of mercury in 
the tube. The air was compressed at constant temperature by pushing on the 
mercury’s surface, revealing what came to be known as Boyle’s law, that for 
constant temperature the volume of a gas of fixed mass and composition is 
inversely proportional to the pressure, now defined by Boyle as the force per 
area exerted on the gas. 


Temperature Scales 


A word must be said about the phrase “at constant temperature.” Boyle lived 
before the establishment of our modern Fahrenheit and Celsius scales, whose 
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forerunners go back respectively to 1724 and 1742. But, although in Boyle’s 
time no meaningful numerical value could be given to the temperature of any 
given body, it was nevertheless possible to speak with precision of two bodies 
being at the same temperature: they are at the same temperature if when put in 
contact neither body is felt to grow appreciably hotter or colder. Boyle’s glass 
tube could be kept at constant temperature by immersing it in a large bath, say of 
water from melting ice. Later the Fahrenheit temperature scale was established 
by defining the temperature of melting ice as 32 °F and the temperature of 
boiling water at mean atmospheric pressure as 212 °F, and defining a 1 °F 
increase of temperature by etching 212 — 32 equal divisions between 32 and 
212 on the glass tube of a mercury thermometer. Likewise, in the Celsius scale, 
the temperatures of melting ice and boiling water are 0 °C and 100 °C, and 
1 °C is the temperature difference required to increase the volume of mercury 
in a thermometer by 1% of the volume change in heating from melting ice 
to boiling water. As we will see in the next chapter, there is a more sophis- 
ticated universal definition of temperature, to which scales based on mercury 
thermometers provide only a good approximation. 

After the temperature scale was established it became possible to carry out a 
quantitative study of the relation between volume and temperature, with pres- 
sure and mass kept fixed by enclosing the air in a vessel with flexible walls, 
which expand or contract to keep the pressure inside equal to the air pressure 
outside. This relation was announced in an 1802 lecture by Joseph Louis Gay- 
Lussac (1775-1850), who attributed it to unpublished work in the 1780s by 
Jacques Charles (1746-1823). The relation, subsequently known as Charles’ 
Law, is that at constant pressure and mass the volume of gas is proportional 
to 7 — Jo, where 7 is the temperature measured for instance with a mercury 
thermometer and 7g is a constant whose numerical value naturally depends 
on the units used for temperature: Jo = —459.67 °F = —273.15 °C. Thus 
To is absolute zero, the minimum possible temperature, at which the gas vol- 
ume vanishes. Using Celsius units for temperature differences, the absolute 
temperature T = 7 — 7p is known today as the temperature in degrees Kelvin, 
denoted K. 


Theoretical Explanations 


In Proposition 23 of his great book, the Principia, Isaac Newton (1643-1727) 
made an attempt to account for Boyle’s law by considering air to consist of 
particles repelling each other at a distance. Using little more than dimensional 
analysis, he showed that the pressure p of a fixed mass of air is inversely 
proportional to the volume V if the repulsive force between particles separated 
by a distance r falls off as 1/r. But as he pointed out, if the repulsive force goes 
as 1/r, then p x V~4/3, He did not claim to offer any reason why the repulsive 
force should go as 1/r and, as we shall see, it is not forces that go as 1/r but 
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rather forces of very short range that act only in collisions that mostly account 
for the properties of gases. 

It was the Swiss mathematical physicist Daniel Bernoulli (1700-1782) who 
made the first attempt to understand the properties of gases theoretically, on the 
assumption that a gas consists of many tiny particles moving freely except in 
very brief collisions. In 1738, in the chapter, “On the Properties and Motions of 
Elastic Fluids, Especially Air” of his book Hydrodynamics, he argued that in a 
gas (then called an “elastic fluid”) with n particles per unit volume moving with 
a velocity v that is the same (because of collisions) in all directions, the pressure 
is proportional to n and to v?, because the number of particles that hit any given 
area of the wall in a given time is proportional to the number in any given 
volume, to the rate at which they hit the wall, which is proportional to v, and to 
the force that each particle exerts on the wall, which is also proportional to v. 
For a fixed mass of gas n is inversely proportional to the volume V, so pV is 
proportional to v7. If (as Bernoulli thought) v? depends only on the temperature, 
this explains Boyle’s law. If v~ is proportional to the absolute temperature, it 
also gives Charles’ law. 

Bernoulli did not give much in the way of mathematical details, and did not 
try to say to what else the pressure might be proportional besides nv, a matter 
crucial for the history of chemistry. These details were provided by Rudolf 
Clausius (1822-1888) in 1857, in an article entitled “The Nature of the Motion 
which We Call Heat.” Below is a more-or-less faithful description of Clausius’ 
derivation, in a somewhat different notation. 

Suppose a particle hits the wall of a vessel and remains in contact with it for a 
small time ¢, during which it exerts a force with component F along the inward 
normal to the wall. Its momentum in the direction of the inward normal to the 
wall will decrease by an amount Ft, so if the component of the velocity of the 
particle before it strikes the wall is v, > 0, and it bounces back elastically with 
normal velocity component —v_, the change in the inward normal component 
of momentum is —2mv_, where m is the particle mass, so 


F=2mov,/t. 


Now, suppose that this goes on with many particles hitting the wall over a time 
interval T >> f, all particles with the same velocity vector v. The number NV of 
particles that will hit an area A of the wall in this time is the number of particles 
in a cylinder with base A and height v,7, or 


N=nAviT, 


where n is the number density, the number of particles per volume. Each of 
these particles is in contact with the wall for a fraction t/T of the time 7, so the 
total force exerted on the wall is 


FN (t/T) = 2mv/t x nAv,T x (t/T) = 2nmvi A. 
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We see that all dependence on the times ¢ and T cancels. The pressure p is 
defined as the force per area, so this gives the relation 


p =2nmv{, . (1.1.1) 


This is for the unphysical case in which every particle has the same value of v_, 
positive in the sense that the particles are assumed to be going toward the wall. 
In the real world, different particles will be moving with different speeds in 
different directions, and Eq. (1.1.1) should be replaced with 
-—? 1 23 2 

p =2nm x 5 Vu) =nm(v{), (1.1.2) 
the brackets indicating an average over all gas particles, with the factor 1/2 
inserted in the first expression because only 50% of these particles will be going 
toward any given wall area. 

To express (vi) in terms of the root mean square velocity, Clausius assumed 
without proof that “on the average each direction [of the particle velocities] 
is equally represented.” In this case, the average square of each component of 
velocity equals (uz) and the average of the squared velocity vector is then 


(v7) = (vt) + (v3) + (v3) = 3(v7) 
and therefore Eq. (1.1.2) reads 
p =nm(v’)/3. (1.1.3) 


This is essentially the result p « n(v7) of Bernoulli, except that, with the 
factor m/3, Eq. (1.1.3) is now an equality, not just a statement of proportion- 
ality. For a fixed mass M of gas occupying a volume V, the number density 
is n = M/mV, so Clausius could use Boyle’s law (which he called Mariotte’s 
law), which states that pV is constant for fixed temperature, to conclude that 
for a given gas (v*) depends only on the temperature. Further, as Clausius 
remarked, Eq. (1.1.3) together with Charles’ law (which Clausius called the 
law of Gay-Lussac) indicates that (v7) is proportional to the absolute temper- 
ature 7. If we like, we can adopt a modern notation and write the constant of 
proportionality as 3k/m, so that 


m(v-)/3 =kT , (1.1.4) 
and therefore Eq. (1.1.3) reads 
p=nkT , (1.1.5) 


where k is a constant, in the sense of being independent of p, n, and 7. But 
the choice of notation does not tell us whether k varies from one type of gas 
to another or whether it depends on the molecular mass m. Clausius could not 
answer this question, and did not offer any theoretical justification for Boyle’s 
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law or Charles’ law. Clausius deserves to be called the founder of thermo- 
dynamics, discussed in Sections 2.2 and 2.3, but these are not questions that 
can be answered by thermodynamics alone. As we will see in the following 
section, experiments in the chemistry of gases indicated that k is the same for 
all gases, a universal constant now known as Boltzmann’s constant, but the 
theoretical explanation for this and for Boyle’s law and Charles’ law had to 
wait for the development of kinetic theory and statistical mechanics, the subject 
of Section 2.4. 

As indicated by the title of his article, “The Nature of the Motion which We 
Call Heat,’ Clausius was concerned to show that, at least in gases, the phe- 
nomenon of heat is explained by the motion of the particles of which gases are 
composed. He defended this view by using his theory to calculate the specific 
heat of gases, a topic to be considered in the next chapter. 


1.2 Chemistry 


Elements 


The idea that all matter is composed of a limited number of elements goes back 
to the earliest speculations about the nature of matter. At first, in the century 
before Socrates, it was supposed that there is just one element: water (Thales) or 
air (Anaximenes) or fire (Heraclitus) or earth (perhaps Xenophanes). The idea of 
four elements was proposed around 450 BC by Empedocles of Acragas (modern 
Agrigento). In On Nature he identified the elements as “fire and water and earth 
and the endless height of air.’ Classical Chinese sources list five elements: water, 
fire, earth, wood, and metal. 

Like the theory of atoms, these early proposals of elements did not come 
accompanied with any evidence that these really are elements, or any suggestion 
how such evidence might be gained. Plato in Timaeus even doubled down and 
stated that the difference between one element and another arises from the 
shapes of the atoms of which the elements are composed: earth atoms are tiny 
cubes, while the atoms of fire, air, and water are other regular polyhedra — 
solids bounded respectively by 4, 8, or 20 identical regular polygons, with every 
edge and every vertex of each solid the same as every other edge or vertex of 
that solid. 

By the end of the middle ages this list of elements had come to seem implau- 
sible. It is difficult to identify any particular sample of dirt as the element earth, 
and fire seems more like a process than a substance. Alchemists narrowed the 
list of elements to just three: mercury, sulfur, and salt. 

Modern chemistry began around the end of the eighteenth century, with 
careful experiments by Joseph Priestley (1733-1804), Henry Cavendish (1743- 
1810), Antoine Lavoisier (1743-1794), and others. By 1787 Lavoisier had 
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worked out a list of 55 elements. In place of air there were several gases: 
hydrogen, oxygen, and nitrogen; air was identified as a mixture of nitrogen and 
oxygen. There were other non-metals on the list of elements: sulfur, carbon, 
and phosphorus, and a number of common metals: iron, copper, tin, lead, silver, 
gold, mercury. Lavoisier also listed as elements some chemicals that we now 
know are tightly bound compounds: lime, soda, and potash. And the list also 
included heat and light, which of course are not substances at all. 


Law of Combining Weights 


Chemistry was first used to provide quantitative information about atoms by 
John Dalton (1766-1844), the son of a poor weaver. His laboratory notebooks 
from 1802 to 1804 describe careful measurements of the weights of elements 
combining in compounds. He discovered that these weights are always in fixed 
ratios. For instance, he found that when hydrogen burns in oxygen, 1 gram of 
hydrogen combines with 5.5 grams of oxygen, giving 6.5 grams of water, with 
nothing left over. Under the assumption that one particle of water consists of 
one atom of hydrogen and one atom of oxygen, one oxygen atom must weigh 
5.5 times as much as one hydrogen atom. 

As we will see, water was soon discovered to be H2O: two atoms of 
hydrogen to each atom of oxygen. If Dalton had known this, he would have 
concluded that an oxygen atom weighs 5.5 times as much as two hydrogen 
atoms, i.e., 11 times the weight of one hydrogen atom. Of course, more accurate 
measurements later revealed that 1 gram of hydrogen combines with about 8 
grams of oxygen, so one oxygen atom weighs eight times the weight of two 
hydrogen atoms, or 16 times as much as one hydrogen atom. Atomic weights 
soon became defined as the weights of atoms relative to the weight of one 
hydrogen atom, so the atomic weight of oxygen is 16. (This is only approximate. 
Today the atomic weight of the atoms of the most common isotope of carbon 
is defined to be precisely 12; with this definition, the atomic weights of the 
most common isotopes of hydrogen and oxygen are measured to be 1.007825 
and 15.99491.) 

The following table compares Dalton’s assumed formulas for a few common 
compounds with the correct formulas: 


Compound Dalton formula True formula 
Water HO H20 
Carbon dioxide CO2 CO2 
Ammonia NH NH3 


Sulfuric acid SO2 H2SO4 
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Here is a list of the approximate true atomic weights for a few elements, 
the weights deduced by Dalton, and (in the column marked with an asterisk) the 
weights Dalton would have calculated if he had known the true chemical 
formulas. 


Element True Dalton Dalton* 
H 1 1 1 

C 12 4.3 8.6 

N 14 4.2 12.6 

O 16 ee) 11 

S 32 14.4 57.6 


To make progress in measuring atomic weights, it was evidently necessary 
to find some way of working out the correct formulas for various chemical 
compounds. This was provided by the study of chemical reactions in gases. 


Law of Combining Volumes 


On December 31, 1808, Gay-Lussac read a paper to the Societe Philomathique 
in Paris, in which he announced his observation that gases at the same tem- 
perature and pressure always combine in definite proportions of volumes. For 
instance, two liters of hydrogen combine with one liter of oxygen to give water 
vapor, with no hydrogen or oxygen left over. Likewise, one liter of nitrogen 
combines with three liters of hydrogen to give ammonia gas, with nothing left 
over. And so on. 

The correct interpretation of this experimental result was given in 1811 by 
Count Amadeo Avogadro (1776-1856) in Turin. Avogadro’s principle states 
that equal volumes of gases at the same temperature and pressure always con- 
tain equal numbers of the gas particles, which Avogadro called “molecules,” 
particles that may consist of single atoms or of several atoms of the same or 
different elements joined together. The observation that water vapor is formed 
from a volume of oxygen combined with a volume of hydrogen twice as large 
shows, according to Avogadro’s principle, that molecules of water are formed 
from twice as many molecules of hydrogen as molecules of oxygen, which is 
not what Dalton had assumed. 

There was a further surprise in the data. Two liters of hydrogen combined 
with one liter of oxygen give not one but two liters of water vapor. This is not 
what one would expect if oxygen and hydrogen molecules consist of single 
atoms and water molecules consist of two atoms of hydrogen and one atom 
of oxygen. In that case two liters of hydrogen plus one liter of oxygen would 
produce one liter of water vapor. Avogadro could conclude that if, as seemed 
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plausible, molecules of water contain two atoms of hydrogen and one atom of 
oxygen, the molecules of oxygen and hydrogen must each contain two atoms. 
That is, taking water molecules as H2O, the reaction for producing molecules 
of water is 


2H2 + O2 > 2H20. 


The use of Avogadro’s principle rapidly provided the correct formulas for gases 
such as CO2, NH3, NO, and so on. Knowing these formulas and measuring 
the weights of gases participating in various reactions, it was possible to cor- 
rect Dalton’s atomic weights and calculate more reliable values for the atomic 
weights of the atoms in gas molecules, relative to any one of them. Taking the 
atomic weight of hydrogen as unity, this gave atomic weights close to 12 for 
carbon, 14 for nitrogen, 16 for oxygen, 32 for sulfur, and so on. Then, knowing 
these atomic weights, it became possible to find atomic weights for many other 
elements, not just those commonly found in gases, by measuring the weights of 
elements combining in various chemical reactions. 


The Gas Constant 


AS we saw in the previous section, in 1857 Clausius had shown that in a gas 
consisting of n particles of mass m per volume with mean square velocity (v7), 
the pressure is p = nm(v~)/3. Using Charles’ law, he concluded that (v”) is 
proportional to absolute temperature. Writing this relation as m(v’)/3 = kT 
with k some constant gives Eq. (1.1.5), p = nkT. But this in itself does not 
tell us how &k varies from one gas to another. This is answered by Avogadro’s 
principle. With N particles in a volume V, the number density isn = N/V, so 
Eq. (1.1.5) can be written 


DV =NKkT. (1.2.1) 


If as stated by Avogadro the number of molecules in a gas with a given pressure, 
volume, and temperature is the same for any gas, then k = pV/NT must be the 
same for any gas. Clausius did not draw this conclusion, perhaps because there 
was then no known theoretical basis for Avogadro’s principle. The universality 
of the constant k, and hence Avogadro’s principle, were explained later by 
kinetic theory, to be covered in the next chapter. The constant k came to be 
called Boltzgmann’s constant, after Ludwig Boltzmann, who as we shall see was 
one of the chief founders of kinetic theory. 

The molecular weight ju of any compound is defined as the sum of the atomic 
weights of the atoms in a single molecule. The actual mass m of a molecule 
is its molecular weight times the mass m, of a hypothetical atom with atomic 
weight unity: 


m=pm. (1.2.2) 
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In the modern system of atomic weights, with the atomic weight of the most 
common isotope of carbon defined as precisely 12, m; = 1.660539 x 10-74 g, 
which of course was not known in Avogadro’s time. A mass M contains 
N = M/m = M/mj,p molecules, so the ideal gas law (1.2.1) can be written 


PV = MkT/mp = (M/u)RT (1.2.3) 
where R is the gas constant 
R=k/m,. (1.2.4) 


Physicists in the early nineteenth century could use Eq. (1.2.3) to measure R, 
and they found a value close to the modern value R = 8.314 J/K. This would 
have allowed a determination of m, and hence of the masses of all atoms of 
known atomic weight if k were known, but k did not become known until the 
developments described in Section 2.6. 


Avogadro’s Number 


Incidentally, a mole of any element or compound of molecular weight ju is 
defined as jz grams, so in Eq. (1.2.3) the ratio M/ expressed in grams equals 
the number of moles of gas. Since N = M/mj, one mole contains a number of 
molecules equal to 1/m, with m, given in grams. This is known as Avogadro’s 
number. But of course Avogadro did not know Avogadro’s number. It is now 
known to be 6.02214 x 107 molecules per mole, corresponding to unit molec- 
ular weight m; = 1.66054 x 10-74 grams. The measurement of Avogadro’s 
number was widely recognized in the late nineteenth century as one of the great 
challenges facing physics. 


1.3 Electrolysis 


Early Electricity 


Electricity was known in the ancient world, as what we now call static 
electricity. Amber rubbed with fur was seen to attract or repel small bits of light 
material. Plato in Timaeus mentions “marvels concerning the attraction of 
amber.” (This is where the word electricity comes from; the Greek word for 
amber is “elektron.’”) 

Electricity began to be studied scientifically in the eighteenth century. Two 
kinds of electricity were distinguished: resinous electricity is left on an amber 
rod when rubbed with fur, while vitreous electricity is left on a glass rod when 
rubbed with silk. Unlike charges were found to attract each other, while like 
charges repel each other. Benjamin Franklin (1706-1790) gave our modern 
terms positive and negative to vitreous and resinous electricity, respectively. 
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In 1785 Charles-Augustin de Coulomb (1736—1806) reported that the force 
F between two bodies carrying charges gq; and qg2 separated by a distance r is 
r= ae (1.3.1) 
r 
where k, is a universal constant. For like and unlike charges the product g1q2 
is positive or negative, respectively, indicating a repulsive or attractive force. 
Coulomb had no way of actually measuring these charges, but he could reduce 
the charge on a body by a factor 2 by touching it to an uncharged body of the 
same material and size, and observe that this reduces the force between it and 
any other charged body by the same factor 2. The introduction of our modern 
units of electric charge had to wait until the quantitative study of magnetism. 


Early Magnetism 


Magnetism too was known in the ancient world, as what we now call per- 
manent magnetism. The Greeks knew of naturally occurring lodestones that 
could attract or repel small bits of iron. Plato’s Timaeus refers to lodestones as 
“Heraclean stones.” (Our word magnet comes from the city Magnesia in Asia 
Minor, near where lodestones were commonly found.) 

Very early the Chinese also discovered the lodestone and used it as a magnetic 
compass (a “‘south-seeking stone”) for purposes of geomancy and navigation. 
Each lodestone has a south-seeking pole at one end, attracted to a point near the 
South Pole of the Earth, and a north-seeking pole at the other end, attracted to a 
point near the Earth’s North Pole. Magnetism was first studied scientifically by 
William Gilbert (1544-1603), court physician to Elizabeth I. It was observed 
that the south-seeking poles of different lodestones repel each other, and like- 
wise for the north-seeking poles, while the south-seeking pole of one lodestone 
attracts the north-seeking pole of another lodestone. Gilbert concluded that one 
pole of a lodestone is pulled toward the north and the other toward the south 
because the Earth itself is a magnet, with what in a lodestone would be its 
south-seeking and north-seeking poles respectively near the Earth’s North Pole 
and South Pole. 


Electromagnetism 


It began to be possible to explore the relations between electricity and mag- 
netism quantitatively with the invention in 1809 of electric batteries by Count 
Alessandro Volta (1745-1827). These were stacks of disks of two different 
metals separated by cardboard disks soaked in salt water. Such batteries drive 
steady currents of electricity through wires attached to the ends of the stacks, 
with positive and negative terminals identified respectively as the ends of the 
stacks from which and towards which electric current flows. 
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In July 1820 Hans Christian Oersted (1777—185 1) in Copenhagen noticed that 
turning on an electric current deflected a nearby compass needle, and concluded 
that electric currents exert force on magnets. Conversely, he found also that 
magnets exert force on wires carrying electric currents. 

These discoveries were carried further in Paris a few months later by André- 
Marie Ampére (1775-1836), who found that wires carrying electric current 
exert force on each other. For two parallel wires of length L carrying electric 
currents (charge per second) J; and /2, and separated by a distance r < L, the 
force is 


km TL 
| ei (1.3.2) 


r 


where k,, is another universal constant. The force is repulsive if the currents 
are in the same direction; attractive if in opposite directions. One ampere is 
defined so that F = 10-7 x L/r newtons if J; = I, = 1 ampere. (That 
is, km =10~7 N/ampere?.) The electromagnetic unit of electric charge, the 
coulomb, is defined as the electric charge carried in one second by a current 
of one ampere. A modern ammeter measures electric currents by observing the 
magnetic force produced by current flowing through a wire loop. 

The connection between electricity and magnetism was strengthened in 1831 
by Michael Faraday (1791-1867), at the Royal Institution in London. He dis- 
covered that changing magnetic fields generate electric forces that can drive 
currents in conducting wires. This is the principle underlying the generation of 
electric currents today. Electricity began soon after to have important practical 
applications, with the invention in 1831 of the electric telegraph by the Ameri- 
can painter Samuel F. B. Morse (1791-1872). 

Finally, in the 1870s, the great Scottish physicist James Clerk Maxwell 
(1831-1879) showed that the consistency of the equation for the generation 
of magnetic fields by electric currents required that magnetic fields are also 
generated by changing electric fields. In particular, while oscillating magnetic 
fields produce oscillating electric fields, also oscillating electric fields produce 
oscillating magnetic fields, so a self-sustaining oscillation in both electric and 
magnetic fields can propagate in apparently empty space. Maxwell calculated 
the speed of its propagation and found it to equal ./2ke/km,~ numerically about 
equal to the measured speed of light, suggesting strongly that light is such a 
self-sustaining oscillation in electric and magnetic fields. We will see more of 
Maxwell’s equations in subsequent chapters, especially in Chapters 4 and 5. 


2 This quantity is independent of the units used for electric charge as long as the currents appearing in 
Eq. (1.3.2) are defined as the rates of flow of charge in the same units as used in Eq. (1.3.1). It is obviously 
also independent of the units used for force, as long as the same force units are used in Eqs. (1.3.1) and 
(1.3.2). 
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Discovery of Electrolysis 


Electrolysis was discovered in 1800 by the chemist William Nicholson (1753- 
1815) and the surgeon Anthony Carlisle (1768-1840). They found that bubbles 
of hydrogen and oxygen would be produced where wires attached respectively 
to the negative and positive terminals of a Volta-style battery were inserted in 
water. Sir Humphrey Davy (1778-1829), Faraday’s boss at the Royal Institution, 
carried out extensive experiments on the electrolysis of molten salts, finding 
for instance that, in the electrolysis of molten table salt, sodium, a previously 
unknown metal, was produced at the wire attached to the negative terminal of 
the battery and a greenish gas, chlorine, was produced at the wire attached 
to the other, positive, terminal. Davy’s electrolysis experiments added several 
metals aside from sodium to Lavoisier’s list of elements, including aluminum, 
potassium, calcium, and magnesium. 

A theory of electrolysis was worked out by Faraday. In modern terms, a 
small fraction (1.8 x 10~? at room temperature) of water molecules are nor- 
mally dissociated into positive hydrogen ions (H*), which are attracted to the 
wire attached to the negative terminal of a battery, and negative hydroxyl ions 
(OH), which are attracted to the wire attached to the positive terminal. At the 
wire attached to the negative terminal, two Ht ions combine with two units of 
negative charge from the battery to form a neutral Hz molecule. At the wire 
attached to the positive terminal, four OH™ ions give one O2 molecule plus 
two H2O molecules plus four units of negative charge, which flow through the 
battery to the negative terminal.? 

Likewise, a small fraction of molten table salt (NaCl) molecules are normally 
dissociated into Na* ions and Cl~ ions. At the wire attached to the negative 
terminal of a battery, one Na™ ion plus one unit of negative charge gives one 
atom of metallic sodium (Na); at the wire attached to the positive terminal, two 
Cl~ ions give one chlorine (Cl2) molecule and two units of negative charge, 
which flow through the battery to the negative terminal. 

In Faraday’s theory, it takes one unit of electric charge to convert a singly 
charged ion such as H* or Cl~ to a neutral atom or molecule, so since molecules 
of molecular weight jz have mass jum}, it takes M/m). units of electric charge 
to convert a mass M of singly charged ions to a mass M of neutral atoms 
or molecules of molecular weight ~. Experiment showed that it takes about 
96 500 coulombs (e.g., one ampere for about 96500 seconds) to convert jz 
grams (that is, one mole) of singly charged ions to neutral atoms or molecules. 
(This is called a faraday; the modern value is 96 486.3 coulombs/mole.) Hence 


3 We now know that it is negative charge, i.e., electrons, that flows through a battery. As far as Faraday knew, 
it was equally possible that positive charges flow through a battery, in which case at the wire attached to the 
negative terminal two H* ions would give an H2 molecule plus two units of positive charge, which would 
flow though the battery to the wire attached to the positive terminal, where four OH™ ions plus four units 
of positive charge would give an O2 molecule and two H20 molecules. 
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Faraday knew that e/m,; ~ 96500 coulombs/gram, where e is the unit of 
electric charge, which was called an “electrine” in 1874 by the Irish physi- 
cist George Johnstone Stoney (1826-1911). Having measured the faraday, if 
physicists knew the value of e then they would know m}, but they didn’t have 
this information until later. Also, no one then knew that e is the charge of an 
actual particle. 


1.4 The Electron 


As sometimes happens, in 1858 a new path in fundamental physics was opened 
with the invention of a practical device, in this case an improved air pump. In his 
pump the Bonn craftsman Heinrich Johann Geissler (1814-1879) used a column 
of mercury as a piston, in this way greatly reducing the leakage of air through 
the piston that had troubled all previous air pumps. With his pump Geissler was 
able to reduce the pressure in a closed glass tube to about a ten-thousandth of 
the typical air pressure on the Earth’s surface. 

With such a near vacuum in a glass tube, electric currents could travel without 
wires through the tube. It was discovered that an electric current would flow 
from a cathode, a metal plate attached to the negative terminal of a powerful 
electric battery, fly through a hole in an anode, another metal plate attached to 
the positive pole of the battery, and light up a spot on the far wall of the tube. 
Adding small amounts of various gases to the interior of the tube caused these 
cathode rays to light up, with orange or pink or blue-green light emitted along 
the path of the ray, when neon, helium, or mercury vapor was added. Using 
Geissler’s pumps, Julius Pliicker (1801-1868) in 1858-1859 found that cathode 
rays could be deflected by magnetic fields, thus moving the spot of light where 
the ray hits the glass at the tube end. 

In 1897 Joseph John Thomson (1856-1940), the successor to Maxwell as 
Cavendish Professor at Cambridge, began a series of measurements of the 
deflection of cathode rays. In his experiments, after the ray particles pass 
through the anode they feel an electric or magnetic force F exerted at a right 
angle to their direction of motion for a distance d along the ray. They then drift 
in a force-free region for a distance D >> d until they hit the end of the tube. If 
a ray particle has velocity v along the direction of the ray, it feels the electric 
or magnetic force for a time d/v and then drifts for a longer time D/v. A force 
F normal to the ray gives ray particles of mass m a component of velocity 
perpendicular to the ray that is equal to the acceleration F'/m times the time 
d/v, so by the time they hit the end of the tube they have been displaced by an 
amount 


FdD 


displacement = (F/m) x (d/v) x (D/v) = ae 
mu 
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The forces exerted on a charge e by an electric field E or a magnetic field B at 
right angles to the ray are 


Fetee = CE , Fmag = evB 


so 
we as eEdD 
electric displacement = a 
mv 
a eBdD 
magnetic displacement = : 
mv 


Thomson wanted to measure e/m. He knew D, d, E, and B, but not v. He 
could eliminate v from these equations if he could measure both the electric and 
magnetic displacements, but the electric displacement was difficult to measure. 
A strong electric field tends to ionize any residual air in the tube, with positive 
and negative ions pulled to the negatively and positively charged plates that 
produce the electric fields, neutralizing their charges. Finally Thomson suc- 
ceeded in measuring the electric as well as the magnetic deflection by using 
a cathode ray tube with very low air pressure. (Both the electric and magnetic 
displacements were only a few inches.) This gave results for the ratio of charge 
to mass ranging from 6 x 107 to 108 coulombs per gram. 

Thomson compared this with the result that Faraday had found in measure- 
ments of electrolysis, that e/m; ~*~ 10° coulombs per gram, where e is the 
electric charge of a singly ionized atom or molecule (such as a sodium ion in 
the electrolysis of NaCl) and m, is the mass of a hypothetical atom of atomic 
weight unity, close to the mass of the hydrogen atom. He reasoned that if the 
particles in his cathode rays are the same as those transferred in electrolysis, 
then their charge must be the same as e, so their mass must be about 10~3my. 
Thomson concluded that since the cathode ray particles are so much lighter than 
ions or atoms, they must be the basic constituents of ions and atoms. 

Thomson had still not measured e or m. He had not even shown that cathode 
rays are streams of particles; they might be streams of electrically charged fluid, 
with any volume of fluid having a ratio of charge to mass equal to his measured 
e/m. Nevertheless, in the following decade it became widely accepted that 
Thomson had indeed discovered a particle present in atoms, and the particle 
came to be called the electron. 
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Thermodynamics and Kinetic Theory 


The successful uses of atomic theory described in the previous chapter did not 
settle the existence of atoms in all scientists’ minds. This was in part because 
of the appearance in the first half of the nineteenth century of an attractive 
competitor, the physical theory of thermodynamics. As we shall see in the first 
three sections of this chapter, with thermodynamics one may derive powerful 
results of great generality without ever committing oneself to the existence of 
atoms or molecules. But thermodynamics could not do everything. Section 2.4 
will describe the advent of kinetic theory, which is based on the assumption that 
matter consists of very large numbers of particles, and its generalization to sta- 
tistical mechanics. From these thermodynamics could be derived, and together 
with the atomic hypothesis it yielded results far more powerful than could be 
obtained from thermodynamics alone. Even so, it was not until the appearance 
of direct evidence for the graininess of matter, described in Section 2.5, that 
the existence of atoms became almost universally accepted. 


2.1 Heat and Energy 


The first step in the development of thermodynamics was the recognition that 
heat is a form of energy. Though so familiar to us today, this was far from 
obvious to the physicists and chemists of the early nineteenth century. Until the 
1840s heat was widely regarded as a fluid, named caloric by Lavoisier. Caloric 
theory was used to calculate the speed of sound by Pierre-Simon Laplace 
(1749-1827) in 1816, the conduction of heat by Joseph Fourier (1768-1830) in 
1807 and 1822, and the efficiency of steam engines by Sadi Carnot (1796-1832) 
in 1824, whose work as we will see in the next section became a foundation 
of thermodynamics. Adding to the confusion, other scientists considered heat 
as some sort of wave. This reflected uncertainty regarding the nature of what 
is now called infrared radiation, discovered by William Herschel (1738-1822) 
in 1800. 
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Heat as Energy 


In 1798 Benjamin Thompson (1753-1814), an American expatriate in Eng- 
land, offered evidence against the idea that heat is a fluid. (Thompson is also 
known as Count Rumford, a title he was given when he later served as military 
adviser in Austria.) It was well known that boring a cannon produces heat, 
which might be supposed to be due to the liberation of caloric from the iron, 
but Rumford observed that if the heat is carried away by immersing the cannon 
in running water while it is being bored there is no limit to the heat that can be 
produced. 

The first measurement of the energy in heat was provided in the mid-1840s 
by James Prescott Joule (1818-1889). In his apparatus a falling weight turned 
paddles in a tank of water, heating the water. The gravitational force on a mass m 
kilograms is m times the acceleration of gravity, 9.8 meters/sec” or 9.8 newtons 
per kilogram. Work is force times distance, so dropping one kilogram a distance 
of one meter gave it an energy equal to 9.8 newton meters, now also known as 
9.8 joules. Joule found that the paddles driven by this dropping weight would 
raise the temperature of 100 grams of water by 0.023 °C, so the paddles pro- 
duced heat equal to 0.023 x 100 calories, the calorie being defined as the heat 
required to raise the temperature of one gram of water by one degree Celsius. 
Hence Joule could conclude that 9.8 joules is equivalent to 2.3 calories, so one 
calorie is equivalent to 9.8/2.3 = 4.3 joules. The modern value is 4.184 joules. 

In 1847 the Prussian physician and physicist Hermann von Helmholtz 
(1821-1894) put forward the idea of the universal conservation of energy, 
whether in the form of kinetic or potential or chemical energy or heat. But 
what sort of energy is heat? For some nineteenth century physicists the question 
was irrelevant. They developed the science of heat known as thermodynamics, 
which did not depend on any detailed model of heat energy. But there was one 
context in which the nature of heat energy seemed evident. In his great 1857 
paper, The Nature of the Motion which We Call Heat, Clausius found that at 
least part of the heat energy of gases is the kinetic energy of their molecules. 


Kinetic Energy 


The concept of kinetic energy was long familiar. If a steady force F is exerted 
on a particle of mass m, it produces an acceleration F'/m, so after a time ¢ the 
velocity of the body is v = Ft/m. The distance traveled in this time is f times 
the average velocity v/2, and the work done on the particle is the force times 
this distance: 


Fxtx Ft/2m = F*t?/2m =mv"/2. 


Instead of this work going into heating a tub of water, as in the experiment of 
Joule, it goes into giving the particle an energy mv~/2. 
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This energy has the special property of being conserved when bodies come 
into contact in collisions. Consider a collision between two rigid balls A and B 
with initial vector velocities v4 and vg. For the moment suppose that the time 
interval t over which this force acts is sufficiently brief that the forces acting on 
the balls do not change appreciably during this time. The force that A exerts on 
B is equal and opposite to the force F that B exerts on A, so Newton’s second 
law tells us that the final velocities of A and B are v4 = v4 + Ft/m, and 
ve = vp — Ft/mg. Hence, as Newton showed, momentum is conserved: 


MaAVa +MBV_ = mava +mByz . (2.1.1) 


Neglecting changes in acceleration during the brief time f, the vector displace- 
ments traveled by A and B equal ¢ times the average velocities, [v4 + v/,]/2 
and [vg + v’,]/2, respectively. If the balls remain in contact during this time 
interval, then these displacements must be the same, so 


vatv, =Veat+V>,. @.1,2) 


To derive a second conservation law, rewrite Eq. (2.1.2) as Vi _ v4 =VA—VB 
and square this, giving 


12 ! ! 12 2 2 
Va —2Vp°VatV4 = Ve —2VB VAT Vy. 


Multiply this with m,4mg and add the square of Eq. (2.1.1), so that the scalar 
products cancel. Dividing by 2(m,4+ mg), the result is another conservation law, 


MA». MB_y ma MB 95 
5 VA + a= 


aa > YB: 
Equations (2.1.1) and (2.1.3) have been derived here only for the case in 
which the particles are in contact only for a brief time interval during which 
the force acting between the bodies is constant, but this is not an essential 
requirement for we can break up any time interval into a large number of brief 
intervals in each of which the change in the force is negligible, Then, since 
MAVA +mpvp and mav /2 + MBV, /2 do not change in each interval, they do 
not change at all, as long as the bodies exert forces on each other only when they 
are in contact. 

In 1669 Christiaan Huygens (1629-1695) reported in Journal des Scavans 
that he had confirmed the conservation of the total of my?/2, probably by 
observing collisions of pendulum bobs, for which initial and final velocities 
could be precisely determined. Newton in the Principia called the conserved 
quantity mv the quantity of motion, while Huygens gave the name vis viva 
(“living force”) to the conserved quantity mv*/2. These two quantities have 
since become known as momentum and kinetic energy. 

On the other hand, it was essential in deriving the conservation of kinetic 
energy that we assumed that particles interact only when in contact. This is 
generally a good approximation in gases, but it is not valid in the presence of 


vit 


(2.1.3) 
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long range forces, such as electromagnetic or gravitational forces. In such cases 
kinetic energy is not conserved — it is only the sum of kinetic energy plus some 
sort of potential energy that does not change. 


Specific Heat 


The total kinetic energy of N molecules of gas of mass m and mean square 
velocity (v7) is Nm/(v?) /2. Clausius had found the relation (1.1.4) between 
mean square velocity and absolute temperature, according to which m(v7) /2 = 
3kT/2, where k is some constant (later identified as a universal constant of 
nature), so the total kinetic energy is 3NkT/2. A mass M of gas of molecu- 
lar weight 2 contains N = M/jm, molecules, so the total kinetic energy is 
3MRT/2u, where R = k/my, is the gas constant (1.2.4). Clausius concluded 
that to raise the temperature of a mass M of gas of molecular weight uw by 
an amount dT at constant volume, so that the gas does no work on its con- 
tainer, requires an energy dE = 3M RdT/2y. The ratio dE /MdT is known as 
the specific heat, so Clausius found that the specific heat of a gas at constant 
volume is 


Cy = 3R/2u. (2.1.4) 


This result must be distinguished from the value for a different sort of specific 
heat, measured at constant pressure, such as when the gas is in a container with 
an expandable wall, for which the volume V can change to keep the pressure p 
equal to the pressure of the surrounding air or other medium. When pressure 
pushes a surface of area A a small distance dL, the work done is the force pA 
times dL, which equals pdV where dV = AdL is the change in volume. 
According to the ideal gas law (1.2.3), pV = RTM/vw, so if the temperature 
is increased by an amount dT, then at constant pressure the gas does work 
pdV = RMdT/jp, and this temperature increase therefore requires an energy 
3M RdT/2u + MRdT/p = SMRdT/3. In other words, the specific heat at 
constant pressure is 


Cp =S5R/2u. (2.1.5) 
This result is often expressed in terms of the ratio of specific heats, 
y=Cp/Cy . (2.1.6) 


So Clausius found that if all the heat of a gas is contained in the kinetic energy 
of its molecules then y = 5/3. 

This did not agree with measurements of the specific heats of common 
diatomic gases, such as oxygen or hydrogen, which Clausius cited as giving 
y = 1.421. Later, it was found that y does indeed equal 5/3 for a monatomic 
gas like mercury vapor, but this left the question, in what form is the energy in 
ordinary gases that are not monatomic? 
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To deal with this issue, Clausius suggested that the internal energy of a gas is 
larger than the kinetic energy of the molecules, say by a factor 1 + f, with f 
some positive number. Then instead of Eq. (2.1.4) we have 


C,=(U+ f) x 3R/2p, (2.1.7) 
and in place of Eq. (2.1.5), 
Cy = (14+ f3R/2n4+ R/p. (2.1.8) 
The specific heat ratio is then 
2 
aI sae f° (2.1.9) 


This is often expressed (especially in astrophysics) as a formula for the internal 
energy density € in terms of the pressure and y: 


E=3RTM( 4 f)/2uV =3(1 + f)p/2= p/y —1). (2.1.10) 


The observation that y ~ 1.4 for diatomic gases like O2 and Ho indicated 
that the internal energy of these gases is larger than the kinetic energy of its 
molecules by a factor 1 + f ~ 5/3. Measurements gave values of y for more 
complicated molecules like H2O or CO2 even closer to unity, indicating that 
f is even larger for these molecules. The reason for these values for f and y 
did not become clear until the formulation of the equipartition of energy, to be 
discussed in Section 2.4. 


Adiabatic Changes 


It often happens that work is done adiabatically, that is, without the transfer of 
heat. In this case the conservation of energy tells us that the work done by an 
expanding fluid must be balanced by a decrease in its internal energy EV: 


0 = pdV +d(EV) = (p+ E)dV + Vd . (2.1.11) 


For an ideal gas, the internal energy per unit volume € is given by Eq. (2.1.10), 
so this tells us that 


O=ypdV+Vdp 
and so, in an adiabatic process, 
pave? xp"; (2.1.12) 
or, since for a fixed mass p « T/V, 
TeV’ xp’. (2.1.13) 


This is in contrast with an isothermal process, for which T is constant and 
-1 
D“XxV~§. 
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Equation (2.1.12) has an immediate consequence for the speed of sound. At 
audible frequencies the conduction of heat is typically too slow to be effective, 
so the expansion and compression of a fluid carrying a sound wave is adiabatic. 
It is a standard result of hydrodynamics, proved by Newton, that the speed of 
sound is 


dp 
Cs =\,/ —— 

Ss ap 
Newton thought that p would be proportional to p ina sound wave, which would 
give cs = ./p/p, but in fact at audible frequencies the pressure is given by the 
adiabatic relation (2.1.12), and cs is larger than Newton’s value by a factor ,/y. 


2.2 Absolute Temperature 


We have been casually discussing temperature, but what precisely do we mean 
by this? It is not hard to give a precise meaning to a statement that one body has a 
higher temperature than another, by a generalization of common experience that 
is sometimes known as the second law of thermodynamics (the first law being 
the conservation of energy). Observation of heat flow shows that if heat can flow 
spontaneously from a body A to a body B, then it cannot flow spontaneously 
from B to A. We can say then that the temperature ta of A is higher than 
the temperature fg of B. Likewise, we say that two bodies are at the same 
temperature if heat cannot flow from either one to the other spontaneously, 
without work being done on these bodies. Temperature defined in this way is 
observed to be transitive: If heat can flow spontaneously from a body A to a 
body B, and from B to a body C, then it can flow spontaneously from A to C. 
This is a property shared with real numbers — if a number a is larger than number 
b, and if b is larger than c, then a is larger than c — and is a necessary condition 
for temperatures to be represented by real numbers. 

But this does not give a precise meaning to any particular numerical value of 
temperature, or even to numerical ratios of temperatures. If, for some definition 
of temperature ¢, a comparison of values of f tells us the direction of heat flow 
then the same would be true of any monotonic function 7 (t). Conventionally 
temperatures are defined by thermometers. With a column of some liquid such 
as mercury or alcohol in a glass tube, we mark off the heights of the column 
when the tube is placed in freezing or boiling water, and for a Celsius tem- 
perature scale etch on the tube marks that divide the distance from freezing to 
boiling to a hundred equal parts. The trouble is that different liquids expand 
differently with increasing temperature, and the temperatures measured in this 
way with a mercury or alcohol thermometer will not be precisely equal. We can 
try instead to give significance to numerical values of temperature by using a 
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gas thermometer, relying on the ideal gas law pV = MRT/vw, but this law 
is approximate, holding precisely only for molecules of negligible size that 
interact only in contact in collisions. How can we give precise meaning to 
numerical values of temperature without relying on approximate relations? 

Surprisingly, as shown by Rudolf Clausius in his 1850 paper! “On the Mov- 
ing Force of Heat,’ it is possible by find a definition of temperature T with 
absolute significance by the study of thermodynamic engines known as Carnot 
cycles. 

Sadi Carnot (1796-1832) was a French military engineer, the son of Lazare 
Carnot, organizer of military victory in the French Revolution, and uncle of 
a later president of the Third Republic. In 1824 Carnot in Reflections on the 
Motive Power of Fire set out to study the efficiency of steam engines, explaining 
that “Already the steam engine works our mines, impels our ships, excavates 
our ports and our rivers, forges iron, fashions wood, grinds grains, spins and 
weaves our clothes, transports our heaviest burdens, etc.” (A few years later he 
might also have mentioned the beginning of steam-propelled locomotives, with 
the opening of the Liverpool—Manchester railroad in 1830.) Carnot invented an 
idealized engine, known as a Carnot cycle, which as we shall see is maximally 
efficient and provides a natural definition of absolute temperature. 

In the Carnot cycle, a working fluid (such as steam in a cylinder fitted with a 
piston) goes through four frictionless steps: 


1. Isothermal: The working fluid does work on its environment, for instance 
by pushing a piston against external pressure, but keeping a constant tem- 
perature by absorbing heat Q2 from a hot reservoir at temperature f2. (We 
will continue to use lower case ¢ to indicate temperature defined in any 
way that indicates the direction of heat flow, without specifying any physical 
significance to its particular numerical values.) 

2. Adiabatic: The working fluid, perfectly insulated from its environment and 
with no internal friction, does more work, with its temperature dropping to 
the temperature f, of a cold reservoir but with no heat flowing in or out. 

3. Isothermal: Work is done on the fluid, for instance by pushing in the piston, 
with its temperature kept constant by its giving up heat Q, to the cold 
reservoir. 

4. Adiabatic: With the working fluid again completely insulated from its envi- 
ronment, work is done on it, bringing its volume back down to its original 
value and its temperature back up to the temperature f2 of the hot reservoir. 


! This paper is reprinted in Brush, The Kinetic Theory of Gases — An Anthology of Classic Papers with 
Historical Commentary, listed in the bibliography. 
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Figure 2.1 A Carnot cycle (not drawn to scale). 


A graph of the pressure versus the volume of the working fluid in this cycle is 
a closed curve, with the net work W done on the environment equal to ¢ pdV — 
that is, to the area enclosed by the curve. (See Figure 2.1.) As long as steps 2 
and 4 are truly adiabatic, the conservation of energy tells us that this work is 


W=Q2- 01 (2.2.1) 
and the efficiency of this cycle is 


W/Q> = Q2- Oi (223) 


Q? 
(We call this the efficiency, having in mind that, as for a steam engine, we have 
to pay for the heat Q» taken up at the higher temperature f, while the heat Q 
given up at the lower temperature f; is wasted.) 

Any Carnot cycle is reversible, because any frictionless adiabatic or isother- 
mal process follows the same track, depending only on its endpoints, whichever 
direction the process takes. But not all thermodynamic cycles, which take a 
working fluid through a series of steps back to the original temperature and 
volume, are reversible even though of course they all conserve energy. For 
reversibility it is not enough that all steps be either isothermal or adiabatic — 
there also should be no friction, which if present would provide an internal 
source of heat that is not available to do work. 
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The importance of Carnot cycles in thermodynamics rests on the following 
theorem: 
I. The efficiency of the Carnot cycle C described above is at least as great as 
that for any general thermodynamical cycle C’, not necessarily reversible, which 
begins with the working fluid absorbing a heat Q’, from a reservoir at the same 
high temperature f, then emitting heat at the same lower temperature f,, and 
then returning to its original temperature and volume, in the process doing net 
work W’. That is, 


W/Q2 > W'/Q5. (2.2.3) 


II. All Carnot cycles that take heat from a reservoir at the same temperature f, 
using it to do work, and giving up waste heat to a reservoir at the same lower 
temperature ft, have the same efficiency, which depends only on fy and fy. 


Proof:” Like any positive real number, the ratio of the work done in the Carnot 
cycle C and in a general cycle C’ can be approximated to an arbitrary accuracy 
by a ratio of positive real integers N and N’: 


W/W =N’'/N. (2.2.4) 


Since any Carnot cycle by definition is reversible, the cycle C has an inverse 
C—!. This is a refrigeration cycle, following the same steps as for C but in the 
opposite order, so that by doing work W on the fluid an amount of heat Q, is 
taken from the reservoir at temperature t; and heat Q2 > Q, is delivered to 
the reservoir at temperature f2 > t;. Suppose we perform a compound cycle 
C*, consisting of N repetitions of C~! and N’ repetitions of C’. According to 
Eq. (2.2.4), the net work done by the working fluid is 


W* = N'W'-NW=0. 
Also, the net heat taken from the hot reservoir at temperature f2 is 
Q3 = N’Q,—NQ>. 
Now, since no work is done in the compound cycle, according to the fundamen- 
tal property of temperature f, it is not possible for positive-definite net heat to 
be transferred fo a reservoir at temperature f from a lower temperature ft), so the 


net heat Q5 taken from the hot reservoir in the cycle C* must be positive-definite 
or zero. Hence, using Eq. (2.2.4), 


p< Gi _ NO-NOr _ (2 7 =) 


— NW NW Ww Ww 


2 This treatment and that of the following section is based on that given by Fermi, Thermodynamics, listed 
in the bibliography. 
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and therefore 
ww’ 
Qo Q, 
as was to be proved in the first part of the theorem. 


As to the second part of the theorem, note that if C’ is also a Carnot cycle 
then, by the same reasoning, 


(2.2.5) 


Ww’ Ww 
Ap ees 
Q, Q> 
so the efficiencies are equal: 
Ww Ww’ 
Ses (2.2.6) 
Qo Q, 


This has now been proved for any pair of Carnot cycles, operating between the 
same temperatures f2 and ¢;, whatever the values of the heat taken from the 
reservoir at temperature f2 and given up to the reservoir at temperature f), so 
the common efficiency can only depend on fz and ft, as was to be proved. 
We shall write this relation in terms of the inefficiency: 
W 
1-—= 21 = F(t), tr) (2.2.7) 

Q. Q 
with F the same function for all Carnot cycles. We next prove that the function 
F(t, t2) takes the form 


F(t, t2) = T(t1)/T (2) (2.2.8) 


for some function T(t). For this purpose we consider a compound cycle consist- 
ing of a Carnot cycle operating between the temperatures f2 and tg < f2 followed 
by a Carnot cycle operating between the temperatures fg and t) < fo, with all the 
waste heat that is given to the reservoir at temperature fg in the first cycle taken 
up from this reservoir in the second cycle. Since (Qo9/Q2)(Q1/Qo0) = Q1/Q2, 
the inefficiency (2.2.7) of the compound cycle is the product of the inefficiencies 
of the individual cycles, so 


F(t, t2) = F(t), to) F (to, t2) . (2.2.9) 
From Eq. (2.2.7) it is evident that F (tz, to) F (to, t2) = 1, so Eq. (2.2.9) may be 
written 
F(t, to) 
F(t, to) 
This holds for any fo with tg > fo > t1, so we can define T(t) = F(f, to) with 


an arbitrary choice of fp in this range, and then Eq. (2.2.10) is the desired result 
(2.2.8). 


F(t, t2) = 


(2.2.10) 
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Now, efficiencies are never greater than 100%, so the ratio F(t},t2) = 
T (t,)/T (t2) in Eq. (2.2.7) must be positive, and so T(t) has the same sign 
for all temperatures. Since only the ratios of the 7's appear in the efficiency, 
we are free to choose this sign to be positive, so that T(t) > O for all ¢. 
Also, inefficiencies are never greater than 100%, so Eq. (2.2.8) shows that 
T(t,) < T(t2) for any f; and fo with t; < fo. That is, T(t) is a monotonically 
increasing function of f and can therefore be used to judge the direction of 
spontaneous heat flow as well as ¢ itself. 

We can therefore define the absolute temperature T by just using T(t) as 
the temperature in place of t. That is, using Eqs. (2.2.7) and (2.2.8), we define 
absolute temperature T by the statement that a Carnot cycle running between 
any two temperatures 7> and 7] has 

21 = A : (2.2.11) 
Op «15 
A Carnot cycle running between an upper temperature 77 and a lower tempera- 
ture 7) has an efficiency 
We (2.2.12) 
Q2 Q? 1D 
Of course, this only defines T up to a constant factor, leaving us free to use 
what units we like for temperature. But we are not free to shift T(t) by adding 
a constant term. Indeed, since in this Carnot cycle heat flows from a reservoir at 
temperature 7> to one at temperature 7;, we must have 77 > T,, and therefore in 
order for the efficiency (2.2.11) to be a positive quantity, the lower temperature 
must have 7| > 0. Because any heat reservoir must have T positive-definite, we 
see that T is the absolute temperature, in the same sense as was found for gases 
by Charles. 

The temperature defined by Carnot cycles is identical (up to a choice of 
units) to the temperature given by a gas thermometer, which for the moment 
we will call 7%, in the approximation that the gas is ideal. To see this, let us 
label the states of the gas as A at the start of the isothermal expansion 1 (and 
at the end of the adiabatic compression 4); as B at the start of the adiabatic 
expansion 2 (and the end of the isothermal expansion 1); as C at the start of 
the isothermal compression 3 (and the end of the adiabatic expansion 2); and 
as D at the start of the adiabatic compression 4 (and the end of the isothermal 
compression 3). Since the expansion from A to B is isothermal, during this 
phase the internal energy of the gas, which is given by Eggs. (1.2.3) and (2.1.10) 
asEV = RTM /(y — 1)p, does not change, and so the heat drawn from the hot 
reservoir is the work done: 


B MRT? f® dV MRT Vv. 
a= | pdV = 2 | — = 2in( 7). 
A bw Ja V a VA 
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Likewise, the heat given up to the cold reservoir in the isothermal compression 


from C to D is 
MRT® 17 
Q = ! In (<) x 


bh Vp 


Further, since the expansion from B to C and the contraction from D to A are 
adiabatic, Eq. (2.1.13) gives V « (78)~!/Y—)), and so during these parts of the 


cycle 
g\ W-b 
2 =i 
Vco/VB = Vp/Va = (7) ; 
T 


and therefore Vg/V4 = Vc/Vp, and the logarithmic factors in Q2 and Q, are 
equal. The efficiency is then 


Q.-Q, Ti-T? 


& 
OQ» T; 


in agreement with Eq. (2.2.12) if T = T®, up to a possible constant factor. 


2.3 Entropy 


In macroscopic classical thermodynamics we characterize the state of a system 
by a set of variables that can be specified independently. For instance, for a fluid 
of fixed mass and chemical composition, in a vessel with adjustable volume 
(say, with a movable piston) the state is specified by giving the values of any 
two of the thermodynamic variables — pressure, volume, temperature, energy, 
etc. — the remaining variables being determined in equilibrium or in adiabatic 
variations by these two values and some equation of state, such as the ideal 
gas law (1.2.3). Many of the consequences of macroscopic classical thermody- 
namics can be deduced from the existence of another thermodynamic variable, 
known as the entropy, introduced in 1854 by Rudolf Clausius, that like other 
thermodynamic variables depends only on the state of the system, although its 
definition seems to indicate that it also depends on the way that the system is 
prepared. 

Suppose a system is prepared in a given state | by starting with it in a standard 
state (labeled 0 below) and then taken to 1 on a path P through the space 
of independent variables used to define thermodynamic states, in which by a 
series of small reversible changes at varying absolute temperatures T it picks up 
small net amounts dQ of heat energy from the environment. (The heat energy 
increment dQ is taken as positive if the system takes heat from the environment 
and negative if it gives up heat.) Then the entropy of this state is defined by 
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S; = So +f a0 : 2.3.1) 
p T 

where the integral is taken over any reversible path from state 0 to state 1, and So 
is whatever entropy we choose to ascribe to the standard state 0. The remarkable 
thing is that the integral here is independent of the particular reversible path 
chosen, so that this really defines the entropy S$; up to a common constant 
term So as a function of the state of the system, not of how it is prepared, 
provided that 7 is the absolute temperature defined, as in the previous section, 
by the efficiency of Carnot cycles. Furthermore, with the entropy defined in this 
way, for any path P’ from state 0 to state 1 that may or may not be reversible, 
we have 


d 
| eee (2.3.2) 


Proof: The first step in proving these results is to prove the following lemma: 
for an arbitrary cycle, reversible or irreversible, that takes a system & from any 
state back to the same state, taking in and giving up heat at various temperatures, 


we have 
d 
f Ge <0. (2.3.3) 
7 


After establishing this lemma, the rest of the proof will be straightforward. 

To prove this lemma we can approximate the cycle by a sequence of brief 
isothermal steps, in each of which the system & takes in heat (if dQ is positive) 
or gives heat up (if dQ is negative) at a momentary temperature T. We can 
imagine that, at each step, the heat taken in or given up is given up or taken 
in by another system, which undergoes a Carnot cycle between the momentary 
temperature T and a fixed temperature 7p. In this Carnot cycle, the ratio of the 
heat dQ given up by the Carnot cycle to & and the heat d Qo taken by the Carnot 
cycle from the reservoir at temperature 7p is given by Eq. (2.2.11): 


dQ oT 
dQo To’ 


d 
dQo = To (2) : 


Hence in the complete cycle the Carnot cycles take in a total net heat Tp $ dQ/T 
from the reservoir at temperature 7p. Since the system & and each of the Carnot 
cycles return to their original states, if this heat taken in at temperature 7) were 
positive-definite then it would have to go into work, which is impossible since 
work cannot be done by taking heat from a reservoir at a fixed temperature with 
no changes elsewhere. (If it could, then this work by producing friction could 


or in other words 
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be used to transfer some heat to any body, even one at a temperature higher 
than 7p.) So we conclude that the integral ¢ dQ/T is at most zero, as was to be 
shown. 

The rest is easy. Note that if two paths P and P’ are both reversible paths 
that go from state 0 to state 1, then P P’—! is a closed cycle, where P’—! is path 
P’ taken in reverse, from state | to state 0. It follows then from the inequality 


(2.3.3) that 
— f Og / GO of oe 
~~ Pp’-! ie P £ P’ T 


But P’ P~! is also a closed cycle, so 


f dQ | dQ dQ 
prp-1 T Jp T pT 


O> 


These two results are consistent only if for reversible paths both cyclic integrals 


vanish, in which case 
d d 
/ =-/ gO (2.3.4) 
pT pT 


We can therefore define the entropy up to an additive constant as in Eq. (2.3.1), 
where P is any reversible path. 

Finally, if P’ is a general path from state 0 to state 1, reversible or irreversible, 
while P is a reversible path from state 0 to state 1, then P’ P—! (but not neces- 
sarily P P’—! !) is a closed cycle, so the inequality (2.3.3) gives 


o> Go a P22 
~~ pp-| T p T pT 


c) 


and therefore, using Eq. (2.3.1), 


as was to be shown. 

In the special case of a completely isolated system &, no heat can be taken 
into & or given up by &, so the integrand in the integral on the left-hand side of 
Eq. (2.3.2) must vanish and therefore $; > So. In isolated systems the entropy 
can only increase. On the other hand, if an isolated system is undergoing only 
reversible changes, then according to Eq. (2.3.1) the entropy is constant. 

There is another definition of entropy, used in information theory as well as 
in physics. If a system can be in any one of a number of states characterized 
by a continuous (generally multidimensional) parameter a, with a probability 
P(a) da of being in states with this parameter in a narrow range da around a, 
then the entropy is 
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S= - [ Pw InP(a) da , (2.3.5) 


where & is the universal constant, known as Boltzmann’s constant, appearing 
in Eq. (1.2.1). As we shall see in the next section, according to kinetic the- 
ory, with a suitable choice of So the thermodynamic entropy (2.3.1) equals the 
information-theoretic entropy (2.3.5). 

The mere fact that the entropy S defined by (2.3.1) depends only on the ther- 
modynamic state has far-reaching consequences. Consider a fixed mass of fluid 
in a vessel with variable volume. The independent thermodynamic variables 
here can be taken as the volume V and the temperature T, with pressure p, 
internal energy EF, and entropy S all functions of V and T. The work done by 
the fluid pressure p in increasing the fluid volume by a small amount dV is 
pdV, so the heat required to change the temperature by an infinitesimal amount 
dT and the volume by an infinitesimal amount dV is 


dQ=dE+ pdVvV, 
so according to Eq. (2.3.1), the change in the entropy is given by 
TdS =dE+ pdV. (2.3.6) 


In other words 
aS(V,T) _ 1 JE(V,T) 


= (2.3.7) 
oT i oT 
OS(V,T 10E(V,T VT 
(V.T) _ 1dE(V.T) | pV.T) nas 
aV T ov T 


To squeeze information about pressure and internal energy from these formulas, 
we use the fact that partial derivatives commute. From Eq. (2.3.7) we have 


d (AS(V,T)\ 1 8°E(V,T) 
aV oT - T @aTav 
while, from Eq. (2.3.8), 


0 (a) 1 a? E(V,T) _ 1 JE(V,T) 4: d(p(V,T)/T) 


aT aV - T aTav T2 av oT 


Setting these equal gives a relation between the derivatives of E and p: 


1 dE(V,T)  a(p(V,T)/T) 

T2 av = oT 

This is for a fixed mass. Since E(V,T) is an extensive variable, it must be 
proportional to this mass but does not otherwise have to depend on volume. 
In fact, it is frequently a good approximation to suppose that, apart from its 
proportionality to mass, E(V, T) is independent of volume. This is the case if 


0= 


(2.3.9) 
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the fluid consists of infinitesimal particles that interact only in contact in colli- 
sions; since there is nothing with the dimensions of length that can enter in the 
calculation of the energy, E(V, 7) cannot here depend on volume. In this case 
Eq. (2.3.9) yields Charles’ law, that for fixed volume V the pressure p(V, 7) is 
proportional to T. This shows again that the absolute temperature 7’ in the ideal 
gas law (1.2.3) is the same up to a constant factor as the temperature T defined 
by the efficiency (2.2.11) of Carnot cycles. 

Although this result was obtained without having a formula for the entropy, 
for some purposes it is useful actually to know what the entropy is. In a homo- 
geneous medium, the entropy S of any mass M of matter may conveniently be 
written as S = Ms, where s is the entropy per unit mass, a function of temper- 
ature and various densities known as the specific entropy. Dividing Eq. (2.3.6) 
by M, we have then 


Tds = d(E/p) + pd(1/p) , (2.3.10) 


where as before € = E/V is the internal energy density and p = M/V is the 
mass density. We consider an ideal gas, for which T = pyx/Rp while € and p 
are related by Eq. (2.1.10): € = p/(y — 1). Then Eq. (2.3.10) gives 


1 d 1 at 
Rp y=1\ p y=1 \pr 


SO 


The solution is 


R 
i ae In = + constant . (2.3.11) 
aoe | py 


We see that the result of Section 2.2 that p « p” for adiabatic processes is just 
the statement that s is constant in these processes, which of course it must be 
since in an adiabatic process the heat input dQ vanishes. 

In many stars there are regions in which convection effectively mixes matter 
from various depths. Since heat conduction is usually ineffective in stars, little 
heat flows into or out of a bit of matter as it rises or falls, and so it keeps the 
same specific entropy. These regions therefore have a uniform specific entropy, 
and therefore a uniform value for the ratio p/p”. For instance, this is the case in 
the Sun for distances from the center greater than about 65% of the Sun’s radius 
out to a thin surface layer. 


Neutral Matter 


We have been mostly concerned with matter in which in each mass there is a 
non-vanishing conserved quantity, the number of particles. There is a different 
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context, with no similar conserved numbers, in which thermodynamics yields 
more detailed information about pressure and energy. In the early universe, at 
temperatures above about 10!° K, there is so much energy in radiation and 
electron—antielectron pairs that the contribution to the energy of the excess of 
matter over antimatter may be neglected. Here there is no number density on 
which the pressure and energy density € = E/V can significantly depend, 
so here E(V,T) = VE(T) and p(V,T) = p(T); thus here Eq. (2.3.9) is an 
ordinary differential equation for p(T): 


2o, 2) 
T2 dT T 


E(T) + p(T) 
a 


Thermodynamics alone does not fix any relation between €(T) and p(T), but 
given such a relation this result gives both as functions of temperature. For 
instance, as an example of the power of thermodynamics, it was known in the 
nineteenth century as a consequence of Maxwell’s theory of electromagnetism 
that the pressure of electromagnetic radiation is one-third of its energy density. 
Setting p = €/3 in Eq. (2.3.12) gives €’(T) = 4E(T), so 


E(T) = 3p(T) =aT"’, (2.3.13) 


0O= 
or, in other words, 


p(T) = (2.3.12) 


where a is a constant, known as the radiation energy constant. But, as we shall 
see in Section 3.1, it was not possible to understand the value of a until the 
advent of quantum mechanics in the early twentieth century. 


The Laws of Thermodynamics 


It is common to summarize the content of classical thermodynamics in three 
laws. As already mentioned, the first law is just the conservation of energy, 
discussed in the context of heat energy in Section 2.1, and the second, usually 
attributed to Clausius, on which the discussion of thermodynamic efficiency in 
Section 2.2 is based, can be stated as the principle that without doing work it is 
not possible to transfer heat from a cold reservoir to one at higher temperature. 
We have seen that this leads to the existence of a quantity, the entropy, which 
depends only on the thermodynamic state and satisfies Eq. (2.3.1) when 
reversible changes are made in this state. This can instead be taken as the 
second law of thermodynamics. 

There are several formulations of the third law, some given by Walther Nernst 
(1864-1941) in 1906-1912. The most fruitful, it seems, is that it is possible 
to assign a common value to the entropy (conventionally taken as zero) for 
all systems at absolute zero temperature, so that at absolute zero the integral 
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in Eq. (2.3.1) must converge. This has the consequence, in particular, that the 
specific heat dQ/dT must vanish for T — 0. This seems to contradict the 
results of Section 2.1 for ideal gases, which give a temperature-independent 
specific heat whether for fixed volume or fixed pressure. The contradiction is 
avoided in practice because no substance remains close to an ideal gas as the 
temperature approaches absolute zero. We will see when we come to quantum 
mechanics that if an otherwise free particle is confined in any fixed volume, then 
it cannot have precisely zero momentum, as required for a classical ideal gas at 
absolute zero temperature. On the other hand, solids can exist at absolute zero 
temperature, and in that limit their specific heats do approach zero. 


2.4 Kinetic Theory and Statistical Mechanics 


We saw in the previous chapter how by the mid nineteenth century the ideal gas 
law had been established through the work especially of Bernoulli and Clausius. 
But, though derived by considering the motions of individual gas molecules, in 
its conclusions it dealt only with bulk gas properties such as pressure, tem- 
perature, mass density, and energy density. For many purposes, including the 
calculation of chemical or transport processes, it was necessary to go further 
and work out the detailed probability distribution of the motion of individual 
gas particles. This was done in the kinetic theory of James Clerk Maxwell and 
Ludwig Boltzmann (1844-1906). Kinetic theory was later generalized to the 
formalism known as statistical mechanics, especially by the American theorist 
Josiah Willard Gibbs (1839-1906). As it turned out, these methods went a 
long way toward not only establishing a correspondence with thermodynamics 
but also explaining the principles of thermodynamics on the assumption that 
macroscopic matter is composed of very many particles, and thereby helping to 
establish the reality of atoms. 


The Maxwell-Boltzmann Distribution 


Maxwell in 1860 considered the form of the probability distribution function 
P(v x, Vy, Uz) for the x, y, and z components of the velocity of any molecule in a 
gas in equilibrium.* The probability distribution function is defined so that the 
probability that these components are respectively between v, and vy + dvx, 
between vy and vy + dvy, and between v; and v, + dvz, is of the form 


P(Ux, Vy, Uz)dvydvydvz . 


3 J. C. Maxwell, Phil. Mag. 19, 19; 20, 21 (1860). This article is included in Brush, The Kinetic Theory of 
Gases — An Anthology of Classic Papers with Historical Commentary, listed in the bibliography. 


34 2 Thermodynamics and Kinetic Theory 


He assumed (without offering a real justification) that the probability that any 
component of velocity of a particle is in a particular range is not correlated with 
the other components of the velocity. Then P(v,, vy, vz) must be proportional 
to a function of v, alone, with a coefficient that depends only on vy and v;, and 
likewise for vy and vz, so P(v,x, Vy, vz) must take the form of a product: 


P(x, Vy, Uz) = f(Ux)g(y)A(vz) . 
Rotational symmetry requires further that P can depend only on the magnitude 
of the velocity, not on its direction, and hence only on ed + vs + ve The only 
function of ve + we + v? that takes the form f(v,)g(vy)h(vz) is proportional to 
an exponential: 
P(ux, Vy, Uz) % exp (— CC; + v5 + v2) . 


The constant C must be positive in order that P should not blow up for large 
velocity, which would make it impossible to set the total probability equal to 
unity, as it must be. Taking C to be positive, and setting the total probability 
(the integral of P over all velocities) for each particle equal to one, gives the 
factor of proportionality: 


Cy 2,.2,.2 
P(x, Vy, Uz) = (S) exp(— C(vy + vy + v7) s 


We can use this to calculate the mean square velocity components: 


= pS 1 
2 agi D a. yp 2 res 
vu = vy = Uz = IC’ 
Clausius had introduced an absolute temperature T by setting mv, = kT, 


where k is a constant to be determined experimentally and v__ is the component 
of the velocity in a direction normal to the container wall, which for an isotropic 
velocity distribution can be taken as any direction, so the constant C must be 
given by C = m/2kT and the Maxwell distribution takes the form 
P( Ne La (— m(v? + v2 + v2)/2kT) (2.4.1) 
Ux, Vy, Uz) = akT Pl — m0, 0, + U, : A. 
As we saw at the end of Section 2.2, the requirement that in an ideal gas mv 1 = 
kT, which led here to C = m/2kT, also ensures that, up to an arbitrary constant 
factor, T is the absolute temperature defined by the efficiency of Carnot cycles. 
The formula for the probability distribution P was derived in 1868 in a more 
convincing way by Boltzmann.‘ He defined a quantity 


4 L. Boltzmann, Sitz. Ber. Akad. Wiss. (Vienna), part II, 66, 875 (1872). A translation into English of this 
article is included in Brush, The Kinetic Theory of Gases — An Anthology of Classic Papers with Historical 
Commentary, listed in the bibliography. 
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— 00 +00 oo 
H=\nP= 7 dv, | av, | dvuz P(v) InP(y), 


—CoO Co —0Oo 
and showed that collisions of gas particles always lead to a decrease in H until 
a minimum is reached, at which P(v) is the Maxwell—Boltzmann distribution 
function. A generalization of this H-theorem was given in 1901 by Gibbs.° The 
generalization and proof are given below, along with the application to gases. 


The General H-Theorem 


Consider a large system with many degrees of freedom, such as a gas with many 
molecules (but not necessarily a gas). The states of the system are parameterized 
by many variables, which we summarize with a symbol a. (For instance, for a 
monatomic gas a stands for the set of positions x;, x2, etc. and momenta pj, p2, 
etc. of atoms 1,2,... For a gas of multi-atom molecules, @ would also include 
the orientations and their rates of change for each molecule.) We denote an 
infinitesimal range of these parameters by da. (For instance, for a monatomic 
gas da stands for the product d*x,d*p,d°*x2d"pp ..., known as the phase space 
volume.) We define P(a@) so that the probability that the parameters of the 
system are in an infinitesimal range da around a is P(a)da, with P normalized 
so that { daP(a) = 1. Define 


H=P= / P(a) da InP(a) . (2.4.2) 


Gibbs showed that H always decreases until it reaches a minimum value, at 
which P(a@) is proportional to the exponential of a linear combination of con- 
served quantities, such as the total energy. 


Proof: Define a differential rate [(@ — 6) such that the rate at which a system 
in state w makes a transition to a state within a range dB around state 6 is 
l'(a — B) df. The probability P(@)da can either increase because the system 
in a range dB of states around 6 makes a transition to the range da of states 
around a, or decrease because the system in the range of states da around a 
makes a transition to some other state in a range dB around f, so 


dP(a)da 
AU = | dB PGT > ada ~ Pla) dat (a p)I 
or, cancelling the differentials da, 
dP(a) 
a = [os [P(B)T(bB ~ a) — P(a)l (a > B)] . (2.4.3) 


Sw. Gibbs, Elementary Principles of Statistical Mechanics, Developed with Especial Reference to The 
Rational Foundation of Thermodynamics (Scribner, New York, 1902). 


36 2 Thermodynamics and Kinetic Theory 


(Note that this makes ‘ daP (a) time-independent, as it must be. Cancelling da 
is justified because phase space volumes such as da do not change with time,) 
Now use (d/dt)y ln y = (Iny + 1)(dy/dt), which gives here 


dH 
a / [ eeap (In P(a) + 1)[P(B) T(B > a) — P(a) F(a > £)). 


Interchange @ and # in the second double integral arising from the second term 
in square brackets: 


“=| [aap p »n(FO)r 2.4.4 
ae adB (6) In| Bay (B > a). (2.4.4) 


Now use the inequality that yIn(x/y) < x — y for any positive numbers x 
and y. (To prove this, note that y In(x/y) — x + y vanishes for x = y, while its 
derivative with respect to x is —(x — y)/x, so it monotonically approaches zero 
from below for x < y and then decreases monotonically for x > y.) From this 
inequality, we have 


ae < [ [ eaae [P(w) — P(B)]T(B > a). (2.4.5) 


Again interchange @ and £, now in the first double integral: 


at < / / dadB P(B) (l(a — B)-T(p> @)] . (2.4.6) 


In the original proof it was assumed that the laws of physics are invariant under 
reversal of the direction of time’s flow, and therefore "(6 — a) = T'(a —> B), 
so that Eq. (2.4.6) says that H decreases with time, in accord with the 
H-theorem. In studies of the decay of neutral K-mesons in 1964-1970 it was 
found that time-reversal invariance is not exact.® Fortunately, the H-theorem 
survives, because on very general grounds in quantum mechanics it can be 
shown without using time-reversal invariance that’ 


[a [l(a > p)-T(p > a] =0. (2.4.7) 


With Eq. (2.4.6), this is enough to require that dH /dt < 0, as was to be shown. 

Let us pause for a moment to reflect how remarkable is this result. The 
decrease of H with time indicates a fundamental difference between past 
and future, even though this result would hold even if the underlying micro- 
scopic laws of physics were entirely symmetric under the direction of time’s 
flow, and indeed as we have seen it was first derived under the assumption of 


6 K.R. Shubert e al., Phys. Lett. 13, 138 (1964). This had been strongly suggested by an earlier experiment 
of J. H. Christensen, J. W. Cronin, V. L. Fitch, and R. Turlay, Phys. Rev. Lett. 13, 138 (1964). 
For a very general proof, with references to earlier work by others, see S. Weinberg, The Quantum Theory 
of Fields, Vol. 1, pp. 150-151 (Cambridge University Press, Cambridge, UK, 1995, 2005). 
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time-reversal invariance. This distinction between past and future is obvious in 
everyday life: A glass tumbler that falls on the floor will shatter, giving up its 
kinetic energy to heat in the floor, but glass fragments lying on the floor will 
not draw energy from the floor and leap up to reassemble as a tumbler. But 
from where does this distinction come? It is the introduction of the concept 
of probability into physics that creates an asymmetry between past and future. 
We can try rewriting the fundamental equation (2.4.3) for the rate of change of 
probability by replacing ¢ with —r, 


dP 
: © ” / dp[P(B)[-T(B > a)]— P(a)[-T@ > A)Il, 


but then we would have to replace the rates with —I°, which makes no sense 
because these rates have to be positive. It is the condition that T > 0 together 
with Eq. (2.4.3) that fixes the direction of time’s flow. We see this also in the 
derivation of Eq. (2.4.5), which follows from Eq. (2.4.4) only if we assume that 
I'(6 — a) is positive. 


Canonical and Grand Canonical Ensembles 


Let’s now return to the H-theorem. The decrease in H will stop when H reaches 
a minimum value, at which it is stationary for any physically possible infinites- 
imal change in P(q@). For an arbitrary infinitesimal change 5? (@), we have 


SH = | 8P(a) de [In P(a) + 1]. 


Now, 5P(q@) is not entirely arbitrary but is constrained by the condition that 
variations in P cannot change either the total probability {P(a)daw = 1 or 
the mean value of any conserved quantity such as the total energy E(q@). In 
order that 6H should vanish for any variation in P(q@) that preserves ‘i daP (a) 
and the mean values of all conserved quantities, it is necessary and sufficient 
that In P(a) should be a linear combination of a constant and any conserved 
quantities. For instance, if the total energy E(q@) is the only conserved quantity 
(as it is for radiation) then if we denote the coefficient of E(@) in this linear 
combination as —1/© we have 


P(a) = exp E _ — ; 


© 


with the constant factor e© fixed by the requirement that { P(a)da = 1. We 
will show below that, with this probability distribution, the quantity —H has 
the defining property (2.3.1) of entropy provided that © is proportional to the 
absolute temperature T, © = kT, so the canonical ensemble is usually written 


E(a) 
kT | 


P(a) = exp E _ (2.4.8) 
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The value of k expresses how we convert units of temperature into units of 
energy, and since it is just a matter of our system of units it cannot depend on 
what sort of system is described by this distribution. 

More generally, there may be some other conserved quantities N;. For 
instance, in a gas consisting of molecules of different types, even if these 
molecules are undergoing chemical reactions, under ordinary conditions the 
numbers \V; of atoms of type i do not change. In such cases, InP(q) will in 
general contain a term proportional to each conserved quantity V;(q@), with a 
coefficient that we will denote as u;/kT, where 4; (or sometimes j4;/kT) is a 
quantity known as the chemical potential. The probability density is then 


_~@-d oa” 


= (2.4.9) 


P(a) = exp E 
A multi-particle system with probabilities distributed in this way is said to 
form a grand canonical ensemble. For instance, in a gas of H2, O2, and H2O 
molecules there are two chemical potentials, for hydrogen and oxygen atoms, 
so in equilibrium at a given temperature we can derive one set of ratios among 
the three molecular densities without knowing anything about the values of 
the chemical potentials, but we need to know these potentials to derive all the 
densities. 


Connection with Thermodynamics 


Aside from a constant factor, H is the entropy, as defined by Clausius in thermo- 
dynamics. To see this, suppose we slowly add heat dQ to our system, preserving 
the equilibrium form (2.4.8) or (2.4.9) of the distribution P and the values of all 
conserved quantities other than the energy, but shifting the average total energy 
E by 5Q. Then 


6bH= [a b6P(a)[In(P(@)) + 1] 


1 a 


which is the defining equation dS = dQ/T of the entropy if (apart from an 
arbitrary constant term) we define 


S=-kH = -« [ aa P(a) InP(a), (2.4.10) 


thus justifying Eq. (2.3.4). The decrease in H implies the increase in entropy, 
thus justifying one consequence of the second law of thermodynamics. This was 
shocking to some physicists of the nineteenth century, who regarded thermody- 
namics as an independent theory, just as fundamental as Newtonian mechanics. 
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Compound Systems 


Equation (2.4.10) makes it easy to justify a fundamental property of the entropy, 
that it is extensive. Suppose a system can be regarded as composed of two parts, 
whose states are described by parameters a, and a2, and that the probabilities in 
these two parts are uncorrelated, so that the probability P(a1, a2) da, daz that 
the system is in a state with parameters in the infinitesimal ranges da, and daz 
around a; and a2 is a product of probabilities for the separate parts: 


P(a1,a2) da; daz = P\(a1) da, x P2(a2) daz, (2.4.11) 


with 
[ea Pi(a1) = | 4axPr(a2) =1. 


Then Eq. (2.4.10) gives 


S= kf derde P(e) P2(a2) [ In Pi (or) + In P2(a2)| = S1 + S2, 
(2.4.12) 


where S; and S2 are the entropies of the two parts of the system: 


Ss; = -k f aay Pi(a1) InPi(a1), So= -« f aes P2(a2) InP2(a2) . 


More generally, the difference S — S; — S2 is a measure of the degree to which 
probabilities in the two subsystems are correlated, and is known as the entan- 
glement entropy. 


Gases 


Ina gas E(q) is the sum of the energies Ey of the individual particles. The prob- 
ability distribution (2.4.9) is then equal to a product of probability distributions 
for the individual particles: 


P(a) =| | p(Ea, Nia) (2.4.13) 


where p(E,, Nig) are the probability distribution functions for the individual 
particle properties: 


P(Eas Nia) & exp (- (« +> Nit) ir] (2.4.14) 


U 
in which Njq is the number of atoms of type i in the ath molecule. The con- 


stant of proportionality must be chosen to make the individual total probabil- 
ities equal to unity. If all the molecules have the same chemical formula, so 
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that N;g = N; is the same for all molecules, then we can absorb the factor 
exp(— )°; Njui/kT) into the constant of proportionality, and simply write 


P(a) = Il P(Eq) where p(Eq) « exp(—Eg/kT). (2.4.15) 


In particular, the distribution of the momentum py arises from the kinetic energy 
term Pp: /2mg in Eg, and Eq. (2.4.15) yields the Maxwell distribution (2.4.1) but, 
as we have now seen, derived by Boltzmann in a more convincing way. 


Equipartition 


One of the most useful results of statistical mechanics is the equipartition of 
energy in cases where the total energy E(q@) can be written as the sum of 
individual energies proportional to squares of independent quantities &,: 


EG Bend= > Gk. (2.4.16) 


For instance, for a gas of N monatomic atoms of mass m, the index n runs 
over 3N values; the &, are the three components of each atom’s momentum 
and cy, = 1/2m. Molecules that are not monatomic can rotate as well as move. 
Here n runs over 6N values, with the &, including the three components of each 
atom’s momentum and the three components of its angular momentum. For an 
angular momentum J, the rotational energy is 


i, aw 
2, 2h) 2k’ 


where the J; characterize the moments of inertia of the molecule. Here the 
extra €, variables are the components of angular momentum, with the c,, the 
corresponding values of 1/27. But for a gas of diatomic molecules there is 
essentially no energy in rotations around the line separating the atoms, so here 
the €, include only the components of each molecule’s angular momentum J 
in the two directions normal to this line, and n runs over 5N values. For an 
ensemble of simple harmonic oscillators the &, include both the displacement 
from rest of each oscillator and the displacement’s rate of change. As we shall 
see in Section 3.1, the energy of a radiation field can also be expressed as 
in Eq. (2.4.16), with the &, the Fourier transforms of each component of the 
electric and magnetic fields. 

Whatever the nature of the &,, because of the factorization of the expo- 
nential the probability of finding any one &, in a range dé, takes the form 
Ay exp(—E,/kT) dé, with E, = Gee and with proportionality constant A, 
fixed by the condition that the total probability for each &, is unity, so that 
An f exp(—E,,/kT) dé, = 1. Thus the mean value of E,, is 
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a a Free din Cnn Penn /KT) fg’ dVEn En exp(—En/KT) 
nS diy exp(—cnE2/kT) fg d/En exp(—En/kT) 

foo dEn En!” exp(-En/kT) —_ (kT)3/20(3/2) 

fd En Eq! exp(—En/kT) — (&T)'?1(1/2) 


=kT/2. (2.4.17) 


It is a fortunate aspect of kinetic theory that these mean energies do not depend 
on the coefficients c,, or indeed on much else about the physical system aside 
from the distribution of the total energy among individual quadratic degrees of 
freedom. 

In any gas the kinetic energy of the nth particle is mnpe /2. The average of 
each of the three terms in this kinetic energy is kT/2, so the average kinetic 
energy of each particle is 3kT /2. Equation (1.1.4) gives mv?/3 = kT, where 
this k is the constant k in the gas constant (1.2.4), so we see that this k is the 
same as the constant & in the general probability distribution (2.4.8) or (2.4.9) 
of statistical mechanics. 

For a generic polyatomic molecule the mean rotational energy associated 
with the three degrees of freedom J; is 3kT/2, but, as already mentioned, 
for a diatomic molecule meaningful rotation is only possible around the two 
axes perpendicular to the linear molecule, so the mean rotational energy is only 
2kT /2. That is, if we write the mean translational plus rotational energy per 
molecule as 3kT/2 x (1+ /f), as in Section 2.1, then f = O for monatomic 
molecules, f = 2/3 for diatomic molecules, while f = 1 for other molecules. 
Equation (2.1.9) gives the specific heat ratio as y = 1+2/3(1+ f), so y = 5/3 
for monatomic gases, y = 7/5 for diatomic molecules (which explains why 
experiments on gases like O2 and Ho2 gave results near y = 1.4 in Clausius’ 
time), and y = 4/3 for other molecules. 

Of course, molecules can also vibrate as well as rotate and move, and energy 
can also go into exciting the clouds of electrons that hold them together. For 
reasons that only became clear with the advent of quantum mechanics, these 
degrees of freedom can only be excited at temperatures much higher than is 
common in our environment. 


Entropy as Disorder 


The entropy can be regarded as a measure of the disorder of a system. To 
see this, it is easiest to approximate the parameters of a system as taking a 
discrete set of values a, instead of a continuum of values a. We can connect 
the continuum and discrete descriptions by dividing the continuum into tiny 
ranges a) < a < a, + da (for simplicity treating a here as if it were one- 
dimensional) and approximating P(@) as a constant P,,/dq@ in each interval, so 
that the probability that q@ is in this interval is 
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ay +da P, 
7 daP(a) = 6a (=) ae anee (2.4.18) 
ay a 


Then the entropy (2.4.10) is 


Py P, 
S=—-k da{—)In{—)=xX-k Py ln P. 2.4.19 
Sse (Fe) mF) dX pin £y ( ) 
where » is a constant, 


“x=kinéa. (2.4.20) 


Since there was an arbitrary constant in Eq. (2.4.10), we can absorb & into the 
definition of that constant, and define the entropy simply as 


S=-k)°P,InP,. (2.4.21) 
Vv 


Since 0 < P, < 1, each In P, is negative and S is positive. The entropy reaches 
its minimum value, zero, in the completely ordered state in which just a single 
P,, equals one and all others vanish. In disordered systems with non-vanishing 
probabilities for different states S is positive-definite. In the completely disor- 
dered state with all P,, equal, the entropy reaches its maximum possible value, 
equal to k times the logarithm of the number 1/P,, of intervals. 


2.5 Transport Phenomena 


So far, we have been concerned with systems in which thermodynamic variables 
such as temperature, pressure, density, etc. are constant in time and space, or 
vary very slowly. But many of the most interesting physical phenomena are 
associated with the transport of such quantities over time from one place to 
another in inhomogeneous media. As we shall see in the following section, the 
study of such transport phenomena gave physicists their first reliable values for 
the masses of individual atoms and molecules. 


Conservation Laws 


In many cases we have to deal with conserved quantities, such as the number of 
molecules or the total electric charge. By a quantity being conserved is meant 
that the net rate of increase of the quantity (negative if a decrease) in any volume 
plus the net rate at which this quantity flows out of the volume (negative if 
flowing in) vanishes. The current 7 of this quantity is defined so that the net rate 
of outward flow is f A dA- 7 where A is the surface surrounding the volume V, 
and dA is an element of area of this surface, taken as a vector pointing outward 
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from the surface. Hence if this quantity has a density VV and a current 7, then 
the conservation condition is 


a [ent fanz =o. (2.5.1) 
dt Jy A 


Using Gauss’s theorem, we can write the second term in Eq. (2.5.1) as an 
integral over the volume of the divergence of the current, so Eq. (2.5.1) is 


equivalent to 
0 
/ as Fagaaea =, 
V ot 


and, since this must be true for any volume, the integrand must vanish: 
0 
peed =O (2.5.2) 


For instance, if matter is carried from one place to another only by a bulk motion 
with velocity v, then the mass density p satisfies an equation of the form (2.5.2), 
with the mass current given by pv: 


a 
ape t ¥- (ov) =0. (2.5.3) 


Momentum Flow 


Such conservation laws are ubiquitous in physics. We will be concerned now 
with a particular set of conserved quantities in fluids, the components of mo- 
mentum. The density of the ith component (with i = 1, 2,3) of momentum is 
pv;, where p is the mass density and v; is the ith component of the bulk velocity. 
Their conservation provides the fundamental dynamical equation for fluids. The 
conservation equation here takes the general form 


a a 
—(pv; 5° — ~=0, 2.5.4 
py (Pri) + ax)! ( ) 


where 77; is the jth component of the current of the ith component of momen- 
tum, and the sums here and below run over the directions 1, 2, 3.8 

By analogy with the case of the mass current pv, we might think that the jth 
component 7;; of the current of the ith component of momentum is pu; x vj. 
This would be the case if momentum like mass were carried from place to place 
only by the bulk motion of the fluid. But of course fluid elements exert forces 
on one another, both pressure and viscous forces, with a consequent transfer of 


8 Tj; is the purely spatial part of a larger array, a tensor with time as well as space components that serves in 
the general theory of relativity as the source of the gravitational field. 
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momentum. So, to keep an open mind, let us write the jth component of the 
current of the ith component of momentum as 


Tji = pvjui +Tji , (2.5.5) 


with tj; a correction term arising from forces acting within the fluid. Accord- 
ing to Eq. (2.5.4), the i-component of the internal force per unit volume is 
—(0/0Xj)T;i- 

So what is t;;? An answer was first given in 1822 by Claude-Louis Navier 
(1785-1836), of the Corps des Ponts et Chaussées, and later in his own formu- 
lation by Sir George Stokes (1819-1903). Rather than trying to reproduce their 
reasoning, we give a treatment below that has a more modern flavor, relying 
largely on principles of invariance. 

First, we can learn a little about the momentum current Tj, by imposing 
the condition that angular momentum should satisfy a conservation condition. 
The density of the ith component of angular momentum is p(x x v);, so for 
instance the rate of change of its i = 3 component is 
2 (pm x v)s) = * (Gia ani) = XI Ey 


t t 
a 0 F Ox; F 


OT; 
OX; 


@ 
j J 


In order for this to take the form of a conservation law we must have T\2 = 71, 
and, since there is nothing special about the 1- and 2- directions, 7;; must be 
entirely symmetric, 


Tj =Tji, (2.5.6) 
and then of course the same is also true of the term tj; in Eq. (2.5.5): 
Tj = Tji - (2.5.7) 


Next, we assume that there are no preferred directions in the environment of 
the fluid, so that tj; is a spatial tensor — that is, it transforms under rotations 
in such a way that }°; ; 4i@jTji 1S invariant under rotations for any vector a, 
such as v. In the absence of external fields, the tensor t;; must be constructed 
from rotationally invariant quantities like o and T and vectors like v, together 
with their space and time derivatives, but no other vectors that would reflect a 
preferred direction in the environment. 

One obvious such tensor is 6;; times any function f of rotationally invariant 
quantities, where 4 ;; is the diagonal matrix with all ones on the main diagonal. 
Here )/; jaa jojif is the rotational invariant a’ f. We can separate a term of 
this form in t;; by defining a quantity 
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- : Sti (2.5.8) 


and writing Tj; as 
Ti = Poji + AT ji ‘ (2.5.9) 
where Atj; is both symmetric and traceless: 


Anj=Aty, > Aw=0. (2.5.10) 


The term péj;; in T;; gives a force per unit volume —Vp in Eq. (2.5.4), so p 
can be identified as the fluid’s pressure. Of course, there is an infinite number of 
ways of constructing the symmetric traceless tensor At;; from the velocity and 
rotational invariants and their derivatives. One simple example is [vjvj + vj vj; — 
26; iv’ /3\f, where f is any function of the rotational invariants. Fortunately, 
we can eliminate many of these possibilities (including this one) by using the 
principle of Galilean relativity. 


Galilean Relativity 


The principle of Galilean relativity? requires that the laws governing fluids 
should be the same for an observer O who uses space coordinates x and for 
an observer O’ moving at any constant velocity —u with respect to O, and who 
therefore uses coordinates related to those of O by 


x =x+ur. (2.5.11) 


Aside from this change of coordinates, the moving observer sees a mass density 
p’ that is the same as p: 


p'(x’,t) = p(x, t). (2.5.12) 


But, for the observer O’, his own velocity —u is subtracted from the velocity 
seen by observer O: 


v (x’,t) = v(x,t) +u. (2.5.13) 


To check whether the equation (2.5.3) of mass conservation is left invariant by 
Galilean transformations, take the partial derivative of Eq. (2.5.12) with respect 
to time, holding x (but not x’!) fixed: 


dp’ (x’, t) dp’ (x’, t) do (x, t) 
Te ax, 


i 


9 In the twentieth century this came to be called Galilean relativity to distinguish it from the Einstein special 
principle of relativity. Both principles state that the laws of nature are unaffected by the uniform motion of 
an observer; as we will see in Chapter 4, it is only the details of the transformation to a moving frame of 
reference that distinguishes Einsteinian from Galilean relativity. 
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Therefore 
we) + V'- (v(x, t)p'(x',1)) 
_ eee + V'- ((v'(x’,t) — u)p’(x’, 1) 
= a +V- (v(x, 1) p(X, 1)) = 0 at 


and so the equation (2.5.3) of mass conservation does satisfy the principle of 
Galilean relativity. 

By following the same reasoning, we can see that the momentum conserva- 
tion law (2.5.4) would be invariant under Galilean transformations if Tj; were 
simply given by the term pv;v; in Eq. (2.5.5). Hence the principle of Galilean 
relativity requires that the term tj; in Eq. (2.5.5) be separately invariant under 
Galilean transformations, and according to Eqs. (2.5.8) and (2.5.9) the same 
must be true of p and At;;. Because of the term u in the Galilean transformation 
(2.5.13), Galilean relativity rules out terms in Atj; such as in the example 
viv; + vjV; — 26; iv’ /3 mentioned above, which involves v itself rather than 
its gradient. 


Navier-Stokes Equation 


There are still an infinite variety of terms that might appear in At;;, containing 
any number of factors of gradients of any order of density and/or velocity. But 
in order to keep the units consistent, the more gradients are contained as factors 
in any term in Ar;;, the more powers of some length that is characteristic of the 
microscopic properties of the fluid must appear in the coefficient of that term. 
If these lengths characterizing the fluid, such as the distance between molecules 
and the mean free path, are all much less than the scale of distances over which 
fluid properties such as density and velocity vary, then At;; is dominated by a 
term proportional to the minimum number of gradients.!° So we should look 
for a possible term in At;; proportional to a single gradient. 

It is not possible to construct a symmetric traceless tensor proportional to a 
single gradient of the density, so a tensor proportional to a single gradient must 
be linear in the gradient of the velocity. There is a unique symmetric traceless 
tensor of this sort: 

AT ji =n Ee + 7 = oy (¥ . | ‘ (2.5.15) 


10. This sort of reasoning has become common in the quantum theory of fields, leading to what are known as 


effective-field theories. 
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where n is a coefficient (Galilean-invariant, like o and p) known as the viscosity 
of the fluid.!'! A minus sign is inserted in Eq. (2.5.15) in order that, with n > 0, 
the heat produced by viscous fluid flow should be positive.!? Using Eqs. (2.5.5), 
(2.5.9), and (2.5.15), we see that the momentum conservation equation (2.5.4) 
takes the form 


(2.5.16) 


This is the Navier-Stokes equation. 


Viscosity 


The measurement of viscosity was well within the capabilities of nineteenth 
century physicists. In a classic calculation using the Navier-Stokes equation, 
Stokes found that a uniform fluid with viscosity 7 exerts a drag force F ona 
spherical ball of radius a moving with velocity v through the fluid, given by 


F =67nav. (2.5.17) 


For instance, if a ball of mass m falls through a fluid, it accelerates until the 
viscous force balances the force mg of gravity (neglecting buoyancy), when it 
has the terminal velocity 

mg 
6 na ~ 


Vterminal = 


The viscosity of gases could also be measured by observing the effect of a 
surrounding gas on the motion of a pendulum. 

It was harder to calculate 7 on the basis of a theory of molecules than to 
measure it. For some time the best that could be done theoretically was a rough 
estimate of this viscosity. 

To make this estimate, consider a uniform fluid experiencing a shear flow. 
For instance, suppose v has only one component, v;, which depends only on x3. 
(The fluid could be enclosed between two flat plates, each in the 1—2 plane, 
with their separation in the 3-direction, and with one of the plates moving in the 


1! Often n is called the shear viscosity. The reason is that, if we were to insist on using whatever formula for 
the pressure p holds in the absence of fluid gradients, then p would not be precisely given by Eq. (2.5.8), 
and Art;j; as defined by Eq. (2.5.9) would not be precisely traceless, so it would have a term proportional 
to 6;;(V - v), with a coefficient known as the bulk viscosity. For complicated reasons the bulk viscosity is 
generally much less than the shear viscosity (for instance, see S. Weinberg, Astrophys. J. 168, 175 (1971)) 
and in any case would have no effect in our present calculation. 

12 For the details of this argument, see Sections 16 and 49 of Landau and Lifshitz, Fluid Mechanics, listed 
in the bibliography. 
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1-direction and the other at rest.) In this case, Eqs. (2.5.5), (2.5.9), and (2.5.15) 
give the 3-component of the current of the 1-component of momentum: 


T3, = At, = rr , (2.5.18) 
0x3 

To find n, let us use molecular theory to calculate the rate per unit area at which 
the 1-component of momentum crosses a plane normal to the 3-axis, which we 
will take as the plane x3 = 0. This current arises because, in addition to being 
carried along in the 1-direction by the bulk velocity v, each molecule has a fluc- 
tuating “peculiar velocity” Av. We make the far-reaching approximation that, 
because of rapid collisions, all directions of this peculiar velocity are equally 
likely. Then the number per unit volume whose peculiar velocity vector Av 
makes an angle with the +3-axis between 6 and 6 + d@ is the ratio of the solid 
angle 27 sin @ d@ to 477, times the total number density n, or nsin@ d0/2. As 
we saw in our calculation of gas pressure in Section 1.1, the number of these 
molecules striking an area dA in this plane in a time df is the number in a 
cylinder with base dA and height Av, dt = cos@|Av|dt, where Av, is the 
component of the peculiar velocity normal to the plane x3 = O and | Av| is the 
magnitude of the peculiar velocity. This number of molecules is equal to 


dA x cos@ |Av|dt x nsin@d@/2. 


Since v; is assumed to be a function v1 (x3) only of x3, a molecule that reaches 
the plane x3 = O having traveled a distance r will have a l1-component of 
momentum mv, (—r cos 8), where m is the mass of the molecule. (In addition to 
the momentum carried by this bulk velocity, the peculiar momenta of molecules 
will also have 1-components, but under the assumption that all directions of 
peculiar velocity are equally likely, these 1-components cancel when we 
integrate over the azimuthal angle around the 3-direction.) A minus sign 
appears in the argument of vj because a molecule with a positive (or negative) 
3-component v, of peculiar velocity, for which cos@ > O (or cos@ < 0), 
arrives at the plane x3 = O from negative (or positive) values of x3. The rate 
per unit area and per unit time at which the 1-component of momentum flows 
through the plane x3 = 0 is then 


* [nl|Av{ : = 
T31 -| 5 cos 0 sino do | mv1(—rcos@) P(r) dr , 
0 


0 


where P(r)dr is the probability that a molecule that reaches the plane x3 = 0 
has traveled a distance between r and r + dr since its last collision with another 
molecule, and the bar again denotes an average over molecules. As long as the 
mean distance between collisions is small compared with the scale of distances 
over which the fluid properties vary, all directions are equivalent, and P(r)dr 
is also the probability that from a random starting position a molecule will 
travel a distance between r and r + dr before its first collision. (Note that this 
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formula applies for molecules with negative as well as positive values of cos 6, 
because molecules with negative values of cos@ have a negative 3-component 
of peculiar velocity and therefore cross the plane x3 = 0 traveling from positive 
to negative values of x3, and thus contribute a negative amount to the flow of 
the 1-component of momentum through this plane.) 

We again make the crucial assumption, which led to the Navier-Stokes equa- 
tion, that the typical distances traveled by molecules are much smaller than the 
scale of distances over which the bulk properties of the fluid vary. Here this 
implies that vj(—r cos @) changes little over the range of r for which P(r) is 
not negligible. This allows us to use a Taylor expansion 

vj (—r cos 0) = v1 (0) —rcosé (=) + 
X37 x3=0 
The first term makes no contribution to the current, because the integral over 0 
of cos 6 sin @ vanishes. This leaves us with the contribution of the next term, 


L\A r) i L\A 0 
13) = aa Av| (=) / cos’ 6 sin@d@ = — il (=) , 
2 0X3 x3=0 40 3 0x3 x3=0 


where £ is the mean free path!? 


t= [or Pmar. 
0 


Comparing this with our formula (2.5.18) for 731, taken from Eq. (2.5.15), 
we find for 7 the positive value 


1 = 
n= a |Av|. (2.5.19) 


Mean Free Path 


Now we need to estimate the mean free path ?. Suppose we make the crude 
approximation that a molecule will collide with another molecule if its center 
passes within an effective cross-sectional area o around the center of the other 
molecule. (For instance, if molecules were balls of radii a, then they would 
collide if their centers approached within a distance 2a, so here 0 = m(2a)*.) 
The probability that a molecule that has already traveled a distance r without 
colliding will collide before it travels a further distance dr is the ratio of the 
total effective area 4a1r7dr no of all the molecules in the shell between r and 
r + dr to the area 4zr? of this shell, and is therefore no dr. The probability 
that the collision occurs in the distance between r and r + dr is then no dr 


13 The notion of a mean free path was introduced by Rudolf Clausius in “On the Mean Lengths of the Paths 
Described by the Separate Molecules of Gaseous Bodies,” Ann. Phys. 105, 239 (1858). 
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times the probability p(r) that it had not collided before it had traveled the dis- 
tance r. To calculate p(r), we note that p(r + dr) equals p(r) times the proba- 
bility 1 — nodr that the molecule will not collide before it travels to r + dr, so 
p'(r) = —p(r)no and, since p(0) = 1, the probability of traveling a distance 
r without colliding is p(r) = exp(—nor). The probability of a collision in a 
distance from r to r + dr is then 

P(r)dr =nodr x p(r) =nodr exp(—nor) . 


The average distance traveled between collisions is then 
CO CO 1 
L= / r P(r)dr= / rnodr exp(—nor) = —. (2.5.20) 
0 0 no 


This formula for £ is often used for media more complicated than a gas of hard 
balls, by taking o as some sort of effective cross section. 
Using the result (2.5.20) in Eq. (2.5.19) gives an estimate of the viscosity: 
m |Av| 
30 
The Maxwell—Boltzmann distribution (2.4.1) gives the mean value of | Av| as 


__ kT RT 
[Av = /—— =,/——. 
2am 27 [b 


where R = k/mzy is the gas constant and x = m/my, is the molecular weight. 
The viscosity is therefore 


n= 


m RT 
—— (2.5.21) 


0 ay Qn 


Quantitatively this result correctly only gives the order of magnitude of 7, but 
it has an important qualitative consequence, that the viscosity is independent of 
the gas density. This result was first found by Maxwell. In a letter to Stokes,!° 
he commented that “This is certainly very unexpected, that the friction should 
be as great in a rare gas as in a dense gas. The reason is that for the rare gas the 
mean path is greater, so that the frictional action extends to greater distance.” 
One reason for finding this result surprising is that it raises the question 
whether a gas that is so rare that it is practically a vacuum can have any vis- 
cosity? It was this point that had led Aristotle in his book Physics to argue 
that a vacuum is impossible. He concluded from his experience with motion 
under the influence of friction that the velocity imparted to a body by a given 
force is inversely proportional to the resistance, thus anticipating Stokes’ law 


14 J. C. Maxwell, “Illustrations of the Dynamical Theory of Gases,” Phil. Mag. 19, 19; 20, 21 (1860). 
15 Quoted on p. 27 of Brush, The Kinetic Theory of Gases, listed in the bibliography. 
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(2.5.17), and so he reasoned that in a vacuum where there can be no resistance 
all velocities would be infinite. 

But, as we have seen, the derivation of the Navier-Stokes equation and 
Stokes’ law rests on the assumption that the mean free path @ is much smaller 
than the scale of distances over which the fluid velocity varies. For a gas 
of sufficiently low density this will no longer be the case, and the concept 
of viscosity loses any meaning. For instance, when a spacecraft or a missile 
re-enters the Earth’s atmosphere at very great altitude, where the mean free 
path is much larger than the dimensions of the re-entering body, the drag 
force F on the body is at first not proportional to its velocity as would be 
required by Stokes’ law for spheres, but rather is F = CppAv~. Here p is the 
air density; A is the vehicle’s cross-sectional area; v is its velocity; and Cp is 
a dimensionless “drag coefficient’ that depends on the shape of the body. This 
is the Knudsen regime. Only when the body reaches lower altitudes with much 
smaller mean free paths does the drag force become proportional to velocity 
and the body approach terminal velocity. 

The fact that viscous drag is independent of gas density was regarded as 
a confirmation of the molecular theory of gas dynamics. Maxwell himself 
checked the validity of this result by measuring the viscosity of a gas at fixed 
temperature, with pressure and hence density varying by a factor 60. The 
observed constancy of viscosity over this large range of gas density tended to 
confirm the molecular theory of gases, but in itself it revealed nothing about the 
nature of molecules. 


Diffusion 


The general formulation above of the transport of momentum in a gas can be 
extended to the transport of other physical quantities in general fluids. One such 
quantity is the number density v of particles suspended in a fluid. These can be 
large molecules, such as molecules of sugar dissolved in water, or the tiny bits 
of organic matter expelled from pollen grains noticed in 1827 by the botanist 
Robert Brown (1773-1858), or artificial little balls used in studies of diffusion 
to be discussed in the next section. The conservation of these particles requires 
that their number density v(x, t) satisfies an equation of the general form (2.5.2): 


~ +¥-(vw+j) =0, (2.5.22) 


where v is the fluid bulk velocity. As in Eq. (2.5.5) we again separate the 
convective term vv in the current from the diffusion term j. Since by itself 
(0/dt)v + V- (vv) is Galilean-invariant, the diffusion term j must be a Galilean- 
invariant vector, and if the scale over which the density v varies is much larger 
than relevant mean free paths then it is dominated by a term with a single 
gradient, which can only be of the form 
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j=—DVvv, (2.5.23) 


where D is a coefficient known as the diffusion constant. 
For instance, if the fluid is at rest and D is independent of time and position 
then Eq. (2.5.22) takes the form 


ee Oe (2.5.24) 
ot 
Here is one solution: 
ea (-24,) (2.5.25) 
; (4x Dt)3/2 4Dt } ’ 


where N is a constant equal to the number { vd 3x of particles suspended in the 
fluid. (This is one way of seeing that the coefficient D defined by Eq. (2.5.23) 
must be positive.) This distribution is spherically symmetric and localized 
within a radius of order /4Dt, which spreads with time owing to the diffusion 
of the suspended particles through the fluid. 

A vivid description of how diffusion arises from the microscopic motion 
of suspended particles was given in 1905 by Albert Einstein!® (1879-1955). 
Consider a time interval t that is short compared with the times over which 
the distribution function changes appreciably but long enough that typical sus- 
pended particles collide many times with the molecules of the fluid. In this 
time the position of each suspended particle jumps by some random vector 
amount A. These amounts differ from one suspended particle to another, in a 
way that is governed by some sort of statistical distribution. Then for vanishing 
bulk velocity v, the number density v(x, f) is changed in this time interval to 


v(x,f +7) =v(K+A,fr), 


the bar indicating an average over the suspended particles. Assuming that v(x, f) 
is slowly varying over times of order t and distances of order |A|, we can 
expand both sides as Taylor series in t and A: 


A rsdn Oa. a? a*v(x, t) 
—v(x,t : 
or y+-: =a 2 Aj “Axi Ox;_ +: 


Under the assumption that all directions of A are equally likely, we have 


OF; 
Ai = 0, AjAj = IAP 
so, to leading order, 


a — 
To v(%,t) = zlAr? V7v(x,t). 


16 A. Binstein, Ann. Phys. 17, 549 (1905). 
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Comparing this with the diffusion equation (2.5.24) for zero bulk velocity, we 
see that the mean square displacement increases as 


|A/2=6rD. (2.5.26) 


This is in accord with the particular solution (2.5.25) of the diffusion equation; 
calculating the integral x? = f N~!v(x, 1)x” d3x gives x? = 6Dt. 

The diffusion constant can be measured by observing this spreading out of 
the suspended particles with time. The calculation by Einstein of the diffusion 
constant D in terms of fundamental constants and its use to measure these 
constants are discussed in the next section. 


2.6 The Atomic Scale 


We have seen various ways in which observations in the nineteenth century pro- 
vided physicists with the values of only the ratios of quantities that characterize 
the scale of individual atoms or molecules. The study of gases allowed measure- 
ment of the gas constant R = k/m, (where k is Boltzmann’s constant and m, is 
the mass an atom would have if it had atomic weight unity, related to Avogadro’s 
number by N4 = 1/1 ); the study of electrolysis allowed measurement of the 
faraday, F = e/m, (where e is the minimum electric charge that is transferred 
in electrolysis; and, under the assumption that the charge of the electron is 
the same as the unit e of electric charge transferred in electrolysis, the study 
of the bending of cathode rays allowed measurement of e/m,. Furthermore, 
under the assumption that molecules are tightly packed in liquids and solids, 
knowledge of the mass density of a liquid or solid gave an approximate value 
for the ratio of the mass to the volume of individual molecules. A measurement 
of m, or k or e or mz or the size of any molecule would yield results for all these 
quantities. No accurate measurements of any of these individual quantities were 
possible before the twentieth century, which is not to say that nineteenth century 
chemists and physicists did not try. 


Nineteenth Century Estimates 


According to Eq. (2.5.21), the viscosity of a gas of known temperature and 
molecular weight is given by known quantities times m/o, where m is the 
mass of the gas molecules and o is their effective cross-sectional area. Defin- 
ing an effective radius a by o = za? and setting the density in liquid form 
equal to m/(4sra3/3) gave a rough estimate of both a and m. In this way, in 
1865 Josef Loschmidt (1821-1895) estimated that air molecules have a diam- 
eter of about 10~7 cm, and in effect that m, 2 x 10-23 g, about ten times 
too large. 
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Using this and similar studies of gas properties, G. J. Stoney (1826-1911) 
in 1874 estimated in effect that my ~ 10~*>g. Then, using the value of e/my 
measured in electrolysis, he estimated that e + 10~?° coulombs. He called this 
the electrine. 

Soon after the discovery of the electron, efforts were made to measure its 
charge directly. At Thomson’s Cavendish Laboratory, J. S. E. Townsend (1868- 
1957) studied falling clouds of water droplets that formed around electrically 
charged ions in gases produced in electrolysis. If the droplets have radius a 
and mass m then, as discussed in Section 2.5, they reach a terminal velocity 
mg/6z na at which viscous drag balances gravity, where 7 is the air viscosity. 
Measuring the terminal velocity and air viscosity then gave a value for m/a. 
A second relation between m and a was provided by the known density p of 
liquid water, which gives m = 4sra*p/3, so both m and a were known. The 
droplets were collected, and their total mass and charge were measured. 
The ratio of the total mass to the known mass m of each droplet gave the 
number of droplets, and the ratio of the total electric charge to the number 
of droplets then gave the charge per droplet. This charge was reported to be 
always close to integer multiples of the same unit of charge, which Townsend 
estimated to be 1.1 x 10~!° coulombs, about 10 times the value found by Stoney. 
Similar results, none very accurate, were obtained by Thomson himself and by 
H. A. Wilson (1874-1964). 

The early years of the twentieth century saw a great improvement in sci- 
entists’ knowledge of atomic magnitudes. This improvement came from three 
chief sources: 


e accurate direct measurement of the electric charges carried by oil droplets 
gave a value for e; 

e measurements of effects due to the diffusion of small spheres suspended in a 
fluid gave a value for Avogadro’s number!’ N4 = 1/m; 

e the study of black body radiation gave a value for k. 


Electronic Charge 


One of the problems with the water droplets studied by Townsend et al. in their 
estimates of the charge of the electron was that the masses of the droplets did 
not remain fixed during the experiment, because water evaporates. To avoid 
this, Robert Andrews Millikan (1868-1953) in 1906 studied individual oil 


17 This is the way in which these experimental results were quoted by physicists at the time and have 
generally been described by historians since then, but it is misleading. The formulas used to analyze 
these experiments actually involved RN, where R is the gas constant appearing in the ideal gas law 
(1.2.3). Since R had already been measured, the measured value of RN, could be used to find N4. But 
since R = k/Nag, they were really measuring k, not N4. I suppose that the results were cited in terms of 
Na rather than k because Avogadro’s number was much more familiar to physicists of the time than the 
Boltzmann constant of statistical mechanics. 
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drops that had picked up electric charge from air ionized by X-rays. Unlike the 
water droplets in the experiments of Townsend ef al., these oil drops were 
large enough that Millikan could study the motion of individual drops. As in 
the earlier experiments, Millikan could measure the mass m and radius a of 
individual drops from their terminal velocity in the absence of any external 
electric fields, using the known density of oil and viscosity of air. Then, when 
he turned on a strong vertical electric field E, a drop carrying electric charge q 
would feel an electric force gE in addition to the gravitational force mg, so the 
terminal velocity would be altered by an amount g E'/67 na. Measuring changes 
in the terminal velocity, and knowing m and a, it was then possible to calculate 
the changes in the drops’ electric charges. For instance, in one run the changes 
in the electric charge q (in units of 10~!° coulombs) were 


9.91, —11.61, 1.66, 5.00, 1.68, —8.31, 6.67, 5.02, etc., 


all close to integer multiples of 1.66 x 10~!? coulombs. After repeated runs, 
Millikan concluded that the fundamental unit of electric charge is e = (1.592 + 
0.003) x 10~!? coulombs. (The modern value is e = 1.6021765 x 107!” 
coulombs.) This immediately allowed the calculation of m (from the faraday 
e/m,), and then k (from the ideal gas constant k/m,), and so on. Even more 
importantly, the observation that droplet charges come close to integer multiples 
of a unit charge gave direct evidence for the discreteness of electric charge. 


Brownian Motion 


The diffusion of particles suspended in a fluid depends on the size and shape 
of the particles, as well as on the fluid properties and fundamental constants. 
Where the particles are molecules, such as sugar molecules dissolved in water, 
it is not possible to deduce relevant information about their size and shape 
with any precision from the properties of solids or liquids composed of these 
molecules. In the first decade of the twentieth century Einstein had the idea 
of learning about fundamental constants from observation of the diffusion of 
artificial particles, like little spherical balls, whose shape, size, and mass were 
accurately known. (This diffusion is a special case of what is termed “Brownian 
motion,” after the botanist Robert Brown mentioned in the previous section.) 
Einstein took notice of the common observation that it is possible to have 
a time-independent inhomogeneous equilibrium distribution of particles such 
as little balls suspended in a fluid, in which the effect of diffusion is can- 
celled by a steady external force F acting on each ball. For example, this force 
could be the combined force of gravity and buoyancy, so that it has magnitude 
F = g(m — mgdisp) (where g is the gravitational acceleration, m is the ball’s 
mass, and mgjsp is the mass of the fluid displaced by the ball), and it acts 
in the —z direction, where z is altitude. In equilibrium this is balanced by a 
kind of pressure, known as the osmotic pressure. With the balls in thermal 
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equilibrium with the fluid at a uniform temperature 7’, and therefore with a 
kinetic energy given by the equipartition of energy as 3k7T/2, their random 
motion, which is responsible for diffusion, exerts a pressure that, according 
to the same arguments used to derive the ideal gas law (1.1.5), has the value 
D(z) = v(z)kT, where v(z) is the number of these balls per unit volume at 
vertical coordinate z. In equilibrium at uniform absolute temperature 7 the 
balance of forces acting on such suspended particles in a slab of area A between 
altitudes z and z + dz then requires that 


[v(z) — v(z + dz)|AkT = F x v(z) Adz, 


and therefore the force creates a decrease in the density of little balls with 
increasing altitude 


v'(2)kT = —Fv(z) . (2.6.1) 


Einstein pointed out that in addition to a balance of forces, in equilibrium 
there has to be a balance of currents. According to Stokes’ law, the external force 
F acting on each little ball gives it a downward velocity v = F/6mna, where a 
is the ball radius and 7 is the fluid viscosity. If not compensated by diffusion, this 
would give these balls a current —vv = —F v/67rna, the minus sign indicating 
that this current is in the downward direction. But because v decreases with 
increasing altitude, diffusion produces an upward current given by Eq. (2.5.23) 
as —Dv’(z), The cancellation of these two currents in equilibrium requires that 


Fv(z)/6mna = —Dv'(z) . (2.6.2) 
Einstein used Eq. (2.6.1) to eliminate the quantity v’/Fv in Eq. (2.6.2), and 
concluded that!® 
kT 
D= . (2.6.3) 
67 Na 


In the appendix to this section a more direct derivation of this formula is given 
(taking account of a possible correction) for the case where there is no external 
force, and diffusion is actually taking place. 

Unlike sugar molecules or the grains observed in Brownian motion, the arti- 
ficial little balls used for this purpose could be chosen to have a known uniform 
radius a, so by measuring D at a given temperature T in a fluid of known 
viscosity 7, it was possible to find the Boltzmann constant k. 


18 a. Einstein, “On the Motion of Small Particles Suspended in Liquids at Rest Required by the Molecular 
Theory of Heat,’ Ann. Phys. 17, 549 (1905). Because F has dropped out of the final formula for D, 
Einstein’s result is independent of the nature of the force acting on the suspended particles, though for 
simplicity we have assumed that this force is independent of position. 


2.6 The Atomic Scale 57 


Within a few years after Einstein’s 1905 paper, an experimental study of the 
diffusion of small bodies was carried out in Paris by Jean Perrin (1870-1942). 
Perrin measured k (or as he said, Avogadro’s number) by observing the decrease 
with altitude of the density of little balls suspended in a vertical column of fluid. 
Equation (2.6.1) has the elementary solution 


v(z) x exp(—F2/kT). (2.6.4) 


(Perrin gave this solution in the form v(z) « exp[—N,4Fz/RT].) Using the 
known value of the combined gravitational and buoyancy force F on the little 
balls gave a value for N4/R = 1/k, which by using the known value of the gas 
constant R, Perrin reported!? as a value for Avogadro’s number N4 = R/k = 
7.05 x 1073/mole, corresponding to m; = 1.42 x 107+ g. As was usual at the 
time, no figure was given for the uncertainty of the measurement. 

Perrin also used microscopic measurements over several minutes of the root 
mean square diffusion of suspended balls in the horizontal direction, in which 
no force is acting. He found that as expected the mean square displacement 
is proportional to the elapsed time. Using Eq. (2.5.26) gave a value for the 
diffusion constant D and, using Einstein’s formula (2.6.3) (which Perrin like 
Einstein wrote as D = RT /62naNa) he found?” that N4 = 7.15 x 1073 /mole, 
corresponding to my = 1.40 x 10~*4g. The fair agreement of this result, 
which was obtained by direct observation of diffusing particles, with Perrin’s 
earlier measurement based on equilibrium in a vertical column gave support 
to the view that diffusion is due to the motion of the balls in equilibrium with 
randomly moving molecules. Perrin was not hesitant in concluding that his work 
confirmed the reality of molecules — his results were summarized in a long 
article”! titled “Brownian Movement and Molecular Reality.’ His measure- 
ments were not far off — with modern definitions of molecular weight, the value 
of Avogadro’s number is 6.022142 x 1073/mole. 


Black Body Radiation 


As we will see in the next chapter, in 1900 the early ideas of Max Planck? 
(1858-1947) about quantum theory led to a formula for the distribution with 
frequency of the radiation energy emitted by a totally absorbing body, which 
depended on the value of kT. Comparison of Planck’s formula with observation 
gave k ~ 1.34 x 107! erg/K. 


19 J. Perrin, Comptes rendus exlvi, 167 (1908) and exlvii, 530 (1908). 
20 J. Perrin, Comptes rendus exlvii, 1044 (1908). 

See Perrin, Brownian Movement and Molecular Reality, listed in the bibliography. 
22 M. Planck, Verh. d. deutsche phys. Ges. 2, 202, 237 (1900). 
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Consistency 


The atomic theory underlying these measurements of microscopic parameters 
and the values found gained much credit from the consistency of the results 
obtained. For instance, in 1901 Planck used his measurement of k together with 
the known value R = 8.27 x 10’ erg/mole K of the gas constant to calculate a 
value for Avogadro’s number N4 = R/k = 6.17 x 1073/mole, in fair agree- 
ment with Perrin’s later result N4 ~ 7 x 1073/mole. Planck also used this 
result together with the known value of the faraday, F = eN,4 = 9.63 x 10* 
coulombs/mole, to calculate the unit of charge, e = 1.56 x 10-19 coulombs, in 
very good agreement with Millikan’s result. 

This happy agreement of fundamental constants led to a widespread accep- 
tance of the atomic theory of matter. For instance, the chemist F. W. Ostwald 
(1853-1932) had been a determined opponent of the atomic theory, but in 1908 
he finally admitted that “I am now convinced that we have recently become 
possessed of experimental evidence of the discrete or grained nature of matter, 
which the atomic hypothesis sought in vain for hundreds and thousands of 
years.” 

An adverse voice remained. The physicist-philosopher Ernst Mach (1838- 
1916), who spoke of “the artificial hypothetical atoms of chemistry and 
physics,” never accepted their existence. As late as 1916, shortly before his 
death, he declared that “I can accept the theory of relativity as little as I can 
accept the existence of atoms and other such dogmas.” This goes to show that 
a scientist can maintain his own principles, bravely holding out against a wide 
consensus of the scientific establishment, and still be wrong. 


Appendix: Einstein’s Diffusion Constant Rederived 


Einstein’s derivation of Eq. (2.6.3) for the diffusion constant D relied on the 
introduction of an external force F acting on suspended particles, which 
prevents their diffusion from disturbing a time-independent equilibrium particle 
distribution. The presence of such an external force such as gravity is not 
uncommon, but it ought to be possible to obtain the same result where there is 
no external force, and where diffusion is actually taking place. Below is such 
a derivation, which indicates the presence of a correction for particles whose 
mass is not negligible. 

The mean velocity v(x, t) of diffusing suspended particles at position x and 
time f is given by setting the current (2.5.23) equal to vv: 


DY v(x, t) 


Wats v(x, ft) 


(2.6.5) 
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According to Stokes’ theorem, spherical balls of radius a with this mean veloc- 
ity experience a mean viscous drag force: 


6xnaDV 
Ki =ctrjav=— (2.6.6) 
v 


with the signs indicating that the viscous force is in a direction opposite to that 
of v, and hence in the direction of the gradient of the particle number density. 
Diffusion occurs because this drag is overcome by osmotic pressure. Following 
the same reasoning that led to Eq. (2.6.1), if the gradient of v is along the 
x-direction, then the force due to an environment at uniform temperature T 
on the particles in a small disk of area dA and thickness dx transverse to 
the x-direction is the osmotic pressure force dAkT[v(x,t) — v(x + dx,t)] = 
—dAkT dx dv(x,t)/dx on the disk. Dividing this by the number dA dx v(x, t) 
of suspended particles in the disk gives the osmotic pressure force on each 
particle: 
- —dAkT dx dv(x,t)/dx kT dv(x, t)/dx 
— dA dxv(x,t) > v(x, f) 

Since in the absence of external forces there is nothing special about the x- 
direction, for a gradient in a general direction we have 


kTVv 
a 


(2.6.7) 


Fosm = 


Assuming that the viscous drag is cancelled by the osmotic pressure, we have 


0 = Fyis + Fosm . (2.6.8) 
which gives Einstein’s formula (2.6.3): 
kT 
= ——, (2.6.9) 
67 na 


More generally, we should take into account the possibility that the viscous 
drag is not precisely cancelled by the osmotic pressure. In this case, Newton’s 
law gives 


dv 
a = Fyis + Fosm ; (2.6.10) 


where m is the mass of the balls and the acceleration dv/dt is the total time 
derivative of the mean velocity, due both to the change in mean velocity at a 
fixed position and to the change in mean velocity of the particles carried from 
one point to another at the mean velocity: 

dv ov 


ee ye 2.6.11 
a a" oa) 
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Inspection of Eqs. (2.6.5), (2.6.11), (2.5.22), and (2.5.23) shows that the magni- 
tude of the acceleration can depend only on D and on L, the scale of distances 
over which v varies appreciably. Dimensional analysis then tells us that it must 
be of order 

dv 
dt 


D2 


aad (2.6.12) 


This shifts the value of |Fyi,| for a given Fosm by an amount of order mD?2 / i: 
and hence shifts the value of the diffusion constant derived from Eq. (2.6.6) by 
a fractional amount of order 

mkT 
L?2 (6m na) 
Einstein’s formula for D is valid only if this is much less than one. 

Einstein did not see this correction, because he was assuming that an 
external force was preventing any mean motion, so that there were no inertial 
forces. But the correction would affect the horizontal diffusion of suspended 
balls in Perrin’s measurement of the diffusion constant. I do not know the 
parameters in Perrin’s experiment, but the fact that he obtained close values for 
Avogadro’s number from the measurement of horizontal diffusion and from the 
measurement of the vertical distribution of suspended balls indicates that in his 
experiment the correction (2.6.13) was not very large. 

This is reassuring regarding the derivation of the Navier-Stokes equation 
(2.5.16), in which it was assumed that terms of second order in the inverse 
of the scale L over which properties of the fluid vary can be neglected. Using 
mv” ~ kT, where v is a typical particle velocity, we see that the fractional 
correction (2.6.13) is of order (L/L)*, where L = kT /6m nav is approximately 
the distance in which viscous forces will bring a particle with radius a and 
velocity v to rest. It is the ratios of just such microscopic lengths as £ to the 
scale L of macroscopic variation whose second and higher powers are dropped 
in the derivation of the Navier-Stokes equation. 


AD : 
— (mD/L°) x (L/6na) © (2.6.13) 


3 
Early Quantum Theory 


The early years of quantum theory were a time of guesswork, inspired by prob- 
lems presented by the properties of atoms and radiation and their interaction. 
This is the subject of the present chapter. Later, in the 1920s, this struggle led to 
the systematic theory known as quantum mechanics, the subject of Chapter 5. 


3.1 Black Body Radiation 


Quantum mechanics started with the problem of understanding radiation in 
thermal equilibrium at a non-zero temperature. We define €(v, T) dv as the 
energy per volume of radiation with frequency between v and v + dv in an 
enclosure with walls at uniform temperature T. As noted in 1859-1862 by 
Gustav Robert Kirchhoff (1824-1887), this distribution is independent of any 
property of the enclosure except for its temperature, because to change E(v, T) 
by changing the material or the shape of the enclosure would require taking 
energy from one frequency to another, while keeping the same temperature, 
which is impossible. 


Radiation Absorption, Emission, and Energy Density 


Kirchhoff called this “black body radiation.” This term refers to a relation 
between the energy density and the rates at which radiation is emitted and 
absorbed from any black heated surface. Consider radiation in an enclosure 
whose walls are at a uniform temperature 7, and think how to calculate the 
energy received by a small patch of area dA on the inner walls of the enclosure. 
At a point within the enclosure at a distance r from this patch, the patch 
subtends a solid angle dAcos@/r*, where 6 is the angle between the line of 
sight from the point to the patch and the normal to the patch. Hence a fraction 
dA cos 6/4zr? of the radiation at this point is aimed at the patch. Ina time t all 
the radiation at a distance r < ct that is aimed at the patch will hit it (where c 
is the speed of light), so the total rate '(v, 7) per unit time, per unit area, and 
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per unit frequency interval at which radiation energy at a frequency near v hits 
the patch will be 


(v,T) ; [2 24 [ nee eo ara 
v,T)= 3 sin —— E(v, v 
tdAdv Jy ~~ “Jo a) 
c 
= (5) PTs. (3.1.1) 


Equilibrium requires that the rate per area of emission of radiation energy in 
a frequency interval dv must equal the rate per area of absorption of radia- 
tion energy in that frequency interval, which is (c/4) f(v, T)E(vT) dv, where 
f(v, T) < 1is the fraction of energy of radiation of frequency v that is absorbed 
when it hits the wall of the enclosure. The emission is evidently greatest for 
“black” walls, which absorb all the radiation that falls on them, so that f(v, 7’) 
tales its maximum value, f(v, 7) = 1. 

In the 1890s Eq. (3.1.1) was used at the Physikalisch-Technische Reich- 
sanstalt in Berlin to accurately measure €(v, 7). This presented a challenge to 
theorists, to understand the measured distribution E(v, T). 


Electromagnetic Degrees of Freedom 


To use the equipartition of energy to calculate the radiation energy density 
E(v, T) from first principles it is necessary to identify the degrees of freedom of 
radiation among which energy is shared. The deepest understanding of radiation 
at the beginning of the twentieth century was based on Maxwell’s equations. In 
unrationalized electrostatic units these are 


loE 42 
Yen 3 V-E=47p, 
7 7 (3.1.2) 
1 0B 
VxE+-—=0, V-B=0, 
c Ot 


where E(x, t) and B(x, f) are the electric and magnetic fields, while J(x, t) and 
p(x,t) are the electric current density and charge density. For empty space, 
p = J = 0, and Maxwell’s equations have solutions of the form 


E(x, tf) = eexp (ik - x — iat) + c.c. 
(3.1.3) 
B(x, t) = bexp (ik - x — iat) + c.c. 


where k and @ are real constants; e and b are complex constant three- 
vectors; and c.c. denotes the complex conjugate of the preceding term. Since 
in Eq. (3.1.3) we are including terms proportional to both exp(—ia@t) and 
exp(iwt), without loss of generality we can take w > 0. Inserting (3.1.3) into 
(3.1.2), we see that this is a solution for p = J = Oif and only if 
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k<haes0, kie=0 
Cc 


“ (3.1.4) 
kxe——b=0, k-b=0. 
c 
Combining these, we have 
2 
We = —"k x b = —k x [kx e] =k’e; (3.1.5) 
c c 


so w = |k|c, and electromagnetic radiation therefore propagates at the speed c. 

Now, we want to calculate the electromagnetic energy in a finite volume V. 
Since € is universal, we can take our enclosure to be a cube, with edges 
L = V'/? that lie along the 1-, 2-, and 3- directions. Whatever boundary condi- 
tions the material of the enclosure imposes on the phases of the waves, it must 
be the same on opposite sides of the cube, so the phase k - x can only change by 
an integer multiple of 27 when x1, x2, or x3 is shifted by L. That is, the wave 
number k and frequency w must take the form 


kn = 77/L)n, On = c|Kyl , (3.1.6) 


where n is a vector with integer components 1, n2, and n3. Hence the general 
electric and magnetic fields in the enclosure are 


E(x,t) = )  e(n) exp (ikn - X — i@nt) + c.c. (3.1.7) 


n 


B(x, t) = > (=) [k x e(n)] exp (iKy - X — i@pt) +.¢.c., (3.1.8) 


nN 


where e(n) is, for each n a three-vector orthogonal to n, and c.c. denotes the 
complex conjugate of the previous term. 

It is a well-known result of classical electrodynamics that the energy density 
in radiation is (E* + B?)/8z:. To integrate this over the volume of the enclosure, 
we use the orthogonality relations 


dox eikokm)x_ J Vo n=m 
Vv O n4¢m, 
(3.1.9) 
3. i(kntkm)x _ | YVon=—m 
[4 xe = —_— 


(For instance, in one dimension for n 4 m, 
L 
i dx e(2ri/L)(n—m)x = (L/2xi(n _ m)) [ee —1]=0, 
0 


while for n = m it is just L. In three dimensions, the integral is a product of 
similar factors.) It follows then that 
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1 —2ion 
ofa E7(x,t) = 2 cme ne 2idont 
+4 S e* (n) - e*(—n)et2ient 


n 


+25 e(n)-e*(n), 


2 
. [as B’(x, t) = » (=) (Kn x e()) « (kn x e(—n))e~ 7! nt 


n 


2 
+) (=) (kn x e*(n)) - (kn x e*(—n))eT7/00" 


2 
+2)° (=) (Kn x e()) + (Kn x e*(n)) . 


Noting that for k-e = k- e’ = 0 we have (k x e) - (k x e’) = k’e - e’, and 
noting also that wr = c*k?, we See that the terms proportional to exp(—2ia@pf) 
in the electric and magnetic energy cancel, as do the terms proportional to 


exp(+2i@nt), leaving us with the total energy: 
1 V 
E= =| d?x (E’(x) + B’(x)] = — ) etm) -e*(n). (3.1.10) 
82 Jy an 


There are two independent components of each e(n) orthogonal to n, each with 
independent real and imaginary terms, all four quantities for each n contributing 
independently to E, so there are four degrees of freedom for each n. 

We will assume that L = V!/? is much larger than the wavelengths c/v under 
consideration, so that the frequencies jy = @,/2z are very close together 
and we can replace sums over n with integrals over v. To count the num- 
ber of integer-component vectors n in a given range of frequencies, note that, 
according to Eq. (3.1.6), 


in| = (ki IL 2a =m Liaw = igh /e. 


The number of allowed frequencies between v and v + dv therefore equals the 
number of integer-component vectors n in a shell with |n| between vL/c and 
(v + dv)L/c. These vectors form a cubic lattice with lattice site width unity, 
so the number dN of these vectors in this shell just equals the volume of the 
shell: 


dN = 4n|n|*d|n| = 42 (L/c)*v? dv = 42 Vv" dv/c? . (3.1.11) 


With two polarizations for each n, the total energy density per frequency interval 
is then 
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E(v,T) = St 2 Ey, Th (3.1.12) 
c 


where E(v,7) is the mean energy for each of the two complex polarization 
vectors orthogonal to a wave vector k with a given value of v = |k|c. 


The Rayleigh—Jeans Distribution 


In 1900 a calculation along these lines was presented by John William Strutt 
(1842-1919), better known as Lord Rayleigh.! He used the result of classical 
thermodynamics, described here in Section 2.4, that for systems whose total 
energy can be expressed as a sum over degrees of freedom of squared ampli- 
tudes, as in Eq. (3.1.10), each degree of freedom such as Re(e) and Im(e) for 
a given polarization and wave vector contributes an energy kT /2, so the mean 
total energy for a given polarization and wave vector is E = kT. Using this in 
Eq. (3.1.12) gives an energy density 

8akT v2 


EQ,T)= a 


(3.1.13) 


A more detailed derivation is given in the next section, to serve as a basis for 
the modification introduced by Einstein. 

Rayleigh had made a mistake of a factor 8, which was corrected in 1905 
by James Jeans” (1877-1946); the result (3.1.13) is therefore known as the 
Rayleigh—Jeans formula. Unfortunately a mere factor 8 was the least of 
Rayleigh’s problems. If Eq. (3.1.13) held at all frequencies, however high, 
then the total energy E in a volume V at any temperature T ~ 0 would be given 


by a divergent integral: 
8rkTV [(* 5 
E= vo dv 
0 


fon) 


a result that became known as the ultraviolet catastrophe. 


The Planck Distribution 


Meanwhile, back in Berlin, a different approach was being followed by Max 
Planck (1868-1947). Measurements indicated that €(v,7) increases as v2 
for small v, reaches a maximum at a frequency proportional to temperature, 
and decreases more or less exponentially for large v. To fit this behavior, it 
would be natural to guess that €(v,T) = CTv? exp(—C'v/T), with C and 
C’ some temperature-independent constants, which would give a total energy 
J dvE(v, T) proportional to T*, which as we saw in Section 2.3 is required by 


! Lord Rayleigh, Phil. Mag. 49. 539 (1900); Nature 72, 54 (1905). 
2 J. Jeans, Phil. Mag. 10, 91 (1905). 
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thermodynamics. But this formula would not agree with a more detailed result 
of classical thermodynamics, known as the Wien displacement law.> Planck in 
1900 guessed the formula* 
8th v3 dv 

c3 exp(hv/kT) — 1” 


E(v,T)dv = (3.1.14) 
where / and k again are constants. 

A little later in the same year, Planck published an attempted derivation> 
of Eq. (3.1.14), which indicated that & is Boltzmann’s constant, while h is a 
new constant, known ever since as Planck’s constant. To derive this formula, he 
adopted a model of the wall of the enclosure whereby it consists of electrically 
charged harmonic oscillators with a wide range of frequencies, with the oscilla- 
tors of frequency v coming into equilibrium with the electromagnetic radiation 
of frequency v. Planck assumed that the energies of oscillators of frequency v 
can only take the form E = nhv, with n a positive integer. Planck calculated 
the radiation emitted by these oscillators when they are in thermal equilibrium at 
temperature 7’, and found that in order for them to absorb just as much radiation 
as they emit, the radiation in the enclosure must have the energy density distri- 
bution given by Eq. (3.1.14). We will not go into Planck’s derivation because 
it was superseded a few years later with the modern derivation, due to Albert 
Einstein, described in the next section. 


Finding the Boltzmann Constant 


By comparing Eq. (3.1.14) with the Reichanstalt data, Planck was able to infer 
values for the Boltzmann and Planck constants. One set of early results was 


k=14x 107'6 erg/K, h=6.6 x io-=" erg sec , 
which compare well with the modern values, 
k = 1.38062 x 107! erg/K, h = 6.62620 x 10777 ergsec . 


As described in Section 2.6, from his value of k and the known gas constant 
R = kN4j, Planck calculated a value for the Avogadro number N44 (or equiva- 
lently, for the mass m,; = 1/N, of unit atomic weight) and from N, and the 
known value of the faraday, F = eNa, he calculated the electric charge e carried 
by singly charged ions in electrolysis. 


3 This result was derived by Wilhelm Wien (1864-1926) in 1893. It requires that the energy density 
distribution must take the form €(v,T) = v3 F (v/T) where F is some function, of only the ratio v/T, 
that is not dictated by thermodynamics alone. For a proof, see Appendix XXXIII of Born, Atomic Physics, 
listed in the bibliography. We will not be relying here on this result. 

4 M. Planck, Verhand. deutsch. phys. Ges. 2, 202 (1900). 

5 M. Planck, Verhand. deutsch. phys. Ges. 2, 237 (1900). 
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The Rayleigh—Jeans formula (3.1.13) agrees with Planck’s for hv « kT, and 
in fact gives the correct low-frequency limit of the energy density distribution. 
It is an irony of history that in principle Rayleigh could have used the com- 
parison of his formula with the data for low frequency to find the value of k, 
and then like Planck calculated the values of mj, and e. For this, the quantum 
hypothesis is unnecessary. This would have been difficult, for it is not easy to 
fit experimental data for €(v, T) at low frequencies with a formula that is only 
supposed to be valid at these frequencies, when the form of the distribution at 
higher frequencies is not known. Anyway, it is just as well that Rayleigh did not 
do this, as his factor of 8 mistake in €(v, T) would have led to the wrong results 
for Avogadro’s number and the fundamental electric charge. 


Radiation Energy Constant 


Unlike the Rayleigh—Jeans distribution, the Planck distribution gives a finite 
total energy density: 


sr) = | E(v,T) dv =aT*. (3.1.15) 
0 


This was in agreement with the known temperature dependence Eq. (2.3.13), 
which as we saw had been derived thermodynamically using the result of clas- 
sical electrodynamics that radiation pressure is one-third of the energy density. 
But now there was a value for the radiation energy constant: 


a = 16m 8k*/15h7c° . 
(Using modern values for h, c, and k, the constant a has the value 7.56577(5) x 
10—! erg/em? K*.) According to Eq. (3.1.1), this also tells us that the total rate, 
Tl = {T(,T) dy, per unit area and per unit time at which a black surface at 


temperature T emits radiation energy is [! = o T+, where o = ca/4 is another 
constant, known as the Stefan—Boltzmann constant. 


3.2 Photons 


Quantization of Radiation Energy 


The modern interpretation of the Planck distribution (3.1.14) emerged from a 
heuristic conjecture of Albert Einstein® in 1905. Planck had assumed a quan- 
tization of the energies of the charged harmonic oscillators that he supposed 
made up the walls of an enclosure. Einstein instead imposed the quantization 
on the radiation itself. 


6 A. Einstein, Ann. Phys. 17, 132 (1905). 
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Confusingly, Einstein was not actually dealing with the Planck distribution, 
but with an attempted fit to the data given earlier by Wilhelm Wien: 


E(v, T) « yp exp(—fBv/T) 


where # is a constant. Einstein used thermodynamic arguments to show that 
this distribution would require that the energy of radiation at frequency v must 
be a whole number multiple of 6Rv/N,. Physicists soon learned that €(v, T) 
is really given by the Planck distribution (3.1.14) and could interpret the Wien 
distribution as the high-frequency limit of the Planck distribution, which for 
large v is proportional to v? exp(—hv/kT). Thus f in Einstein’s quantization 
condition could be identified as 8 = h/k. With the gas constant R equal to 
kNa, this means that the energy of the radiation at frequency v must be a 
whole number multiple of (h/k)(Rv/N,) = hv, the same rule as for Planck’s 
mythical oscillating charges. 


Derivation of Planck Distribution 


To see how Einstein’s assumption leads to the Planck distribution (not the Wien 
distribution), it is helpful to follow the reasoning a few years later of Hendrik 
Lorentz’(1853-1928). For this, we will go back in more detail to the use by 
Rayleigh and Jeans of the principle of equipartition of energy, which will make 
it easy to see the difference made by Einstein’s assumption of the quantization 
of radiation energy. 

Recall that by counting degrees of freedom, we found that the energy density 
per frequency interval is given by Eq. (3.1.12): 


2 
£0, 7)= Srv Ev, TP), (3.2.1) 
CG 


where E(v,7) is the mean energy of each polarization state of electromag- 
netic waves of frequency v. According to Eq. (3.1.10) the energy Exe for a 
given wave vector k = 2zn/L and polarization vector e orthogonal to n in a 
cubical box of volume L? is a sum of squares: 


3 
Ene = © TRe e(n))” + (Ime(n))*] . 
20 


Therefore, in the same way as in Eq. (2.4.17), in classical statistical mechanics 
we would have a mean energy for each frequency and polarization, given by 


7 H. A. Lorentz, Phys. Z. 11m, 1234 (1910). 
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[25,4X [°5,d¥ (X? + ¥) exp (— (X? + Y?)/kT) 
[oo dX fo d¥ exp (— (X? + Y?)/kT) 


Ee 0 
X=,/—Ree(n), Y=,/—Ime(n). 
20 20 


(The factor L3/2z in dX dY is irrelevant, as it cancels between numerator and 
denominator.) Defining 6 and E by X = VE cos@ and Y = VE sin@ and 
integrating over 0 gives 


EQ, T )class = 


cs 


where 


Jo: 2a JEdVEE exp(—E/kT) 

Joo 2x VEdVE exp(—E/kT) 
Jo. dEE exp(—E/kT) 

= = kT 
fo. dE exp(—E/kT) 


EQ, T )class = 


(3.2.2) 


This is the classical equipartition result used by Rayleigh and Jeans, leading to 
the Rayleigh—Jeans energy density distribution (3.1.13). 

According to Einstein’s conjecture, the energy E (not X* or Y7) of each 
polarization state can only take the values nhv, with n = 0, 1, 2,..., so the 
integrals in Eq. (3.2.2) must be replaced with sums. That is, according to 
Einstein, 


= neo nh —nhv/kT d = 
E\v,T) = 2on=0” See =— In S © exp(—nhv/kT) 
0 exp(—nhv/ kT) dU/kT) | 
d - h —hv/kT 
= ———__In[1 — exp(—hv/kT)]' = pais Neal aE 
d(1/kT) 1 — exp(—hv/kT) 
hv 
= : (3.2.3) 
exp(hv/kT) — 1 
Using this in Eq. (3.2.1) gives an energy density distribution 
8h : 
eaTy=— id (3.2.4) 


c3 exp(hv/kT) —1— 


This is the same as the Planck distribution (3.1.14), but derived from quite 
different assumptions. 

Einstein’s interpretation of his quantization assumption E = nhv was that 
the energy in radiation of frequency v comes in individual bundles, or “quanta,” 
each with energy hv. A state of this radiation with energy nhv is simply one 
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containing n such quanta. This interpretation was soon confirmed by data on 
the photoelectric effect. 


Photoelectric Effect 


Several physicists had observed in the late nineteenth century that electric 
charge is expelled from metal surfaces when the surfaces are exposed to 
ultraviolet light. After Thomson’s discovery of the electron in 1897, it was 
generally assumed that this charge was carried by electrons. A metal is a 
lattice of positively charged ions that have each lost one or more electrons, 
which circulate freely through the metal, accounting for the good electrical and 
thermal conductivity of metals. The positively charged metal ions produce an 
electrostatic potential, so that in normal circumstances it takes a definite energy 
(called the “work function”) ¢@ to pull the negatively charged electrons out of 
the metal. One might think that the more intense the radiation, the more energy 
is given to these electrons. In 1902 experiments by Philipp Lenard (1862-1947) 
showed that this is not the case. Instead, no matter how intense the radiation, 
no electrons are ejected from the metal unless the frequency exceeds a certain 
minimum (which is why photoelectricity was discovered using ultraviolet rather 
than visible light), and when that condition is met, the energy of each expelled 
electron increases with the frequency. Only the number of photoelectrons 
depends on the intensity of the radiation, not their individual energies. 

Einstein in his 1905 paper seized on these phenomena as evidence for his 
quantization assumption. Any electron expelled from the metal was assumed to 
have been struck by one of Einstein’s quanta. In order to get out of the metal, 
the energy hv of the radiation quantum must at least equal the work function 
¢, so no electrons can be emitted unless v > @/h. If this condition is satisfied, 
then the kinetic energy E, of the emitted electron will be given by the excess 


E.=hv—$. (3.2.5) 


These energies could be measured by observing how strong an electric field is 
needed to stop the electron emission by exerting a force toward the surface. In 
this way, Millikan at Chicago in 1914-1916 (while Europeans had other things 
on their minds) confirmed the form of the Einstein relation (3.2.5), and found 
a value for h, which turned out to agree with the value measured in studies of 
black body radiation. 


Particles of Light 


As we shall see in Chapter 4, with the advent of special relativity it became 
clear that since Einstein’s quanta would have to travel at the speed of light, 
as particles they would have to have momenta equal to the energy divided by 
c, or hv/c. As discussed in Section 4.5, this was confirmed in experiments of 
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Arthur Holly Compton (1892-1962) in 1922-1923 on the scattering of X-rays 
by electrons in atoms; Compton’s measurements removed the last doubt about 
the existence of Einstein’s radiation quanta. A few years later, they were given 
their present name, photons. 


3.3 The Nuclear Atom 


It was not possible to make progress in applying quantum ideas to atoms without 
some understanding of what atoms are. The growth of this understanding began 
with the discovery of radioactivity. 


Radioactivity 


In 1896 Antoine Henri Becquerel (1852—1908) was trying to find whether var- 
ious crystals that had been exposed to sunlight would emit energetic radiation, 
like the X-rays that had been discovered a few months earlier. He put these 
crystals next to photographic plates wrapped in dark paper that would block 
sunlight but might not block rays emitted by the crystal that had earlier been 
exposed to the Sun. A wire mesh was inserted between the crystal and the 
paper, so that any exposure of the plate by these rays would show an image 
of the mesh. One of the crystals Becquerel intended to study was uranium 
potassium bisulphate, because it exhibits the phenomenon of phosphorescence, 
the delayed emission of light by substances such as the luminous paint on clock 
dials that have been exposed to bright light. At first, in February 1896, the 
skies in Paris were too cloudy to provide the needed sunlight, so Becquerel 
left his crystals and photographic plates in a drawer. When he took them out 
in early March, he found that the plates that had been left near the crystals 
containing uranium were exposed, showing clear images of the wire mesh, 
even though they had never been put in sunlight. In the following months he 
found that some sort of ray from various compounds of uranium would expose 
photographic plates, even when the crystals and plates were put together in 
lead-lined boxes. 

It was soon realized that this phenomenon was not limited to uranium com- 
pounds. In 1898 Marie Curie (1867-1934) showed that similar phenomena are 
produced by compounds of thorium, and she and Pierre Curie (1859-1906) were 
able to isolate a previously unknown element, radium, that was millions of times 
more active than uranium or thorium. The Curies gave this phenomenon the 
name radioactivity. 

Two different kinds of radioactivity were distinguished in the next few years 
by Ernest Rutherford (1871-1937). There are beta rays, which are about as pen- 
etrating as X-rays, and alpha rays, which cannot penetrate even very thin sheets 
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of foil. (Gamma rays, which are energetic photons, were discovered later.) In 
1898 Becquerel discovered that beta rays could be deflected by magnetic and 
electric fields. From the amount of the deflection he concluded that these rays 
are composed of particles with the same ratio of charge to mass as the cathode 
ray particles whose deflection had been measured by J. J. Thomson shortly 
before. Beta rays are in fact what later became known as electrons, but moving 
much faster than the electrons in cathode rays. It was harder to deflect alpha 
rays, but this was eventually accomplished by Rutherford. From the amount 
and direction of deflection, Rutherford concluded that these rays consist of 
positively charged particles, with a ratio of charge eg to mass mq equal to 
half the ratio of charge to mass for hydrogen ions, as had been measured in 
electrolysis. The lightest element heavier than hydrogen is helium, with atomic 
weight about four times greater than hydrogen, so Rutherford guessed that alpha 
rays are helium ions, with charge twice that of hydrogen ions. That is, 


Cu ze le 


Ma 4m ~ 2m ‘ 


This was finally confirmed in 1907 when Rutherford together with T. D. Royds 
was able to collect enough alpha particles from radioactive decay to show that 
the atoms that they form absorb light at the same spectral frequencies as helium. 

Once the particles in alpha and beta rays were identified respectively as 
helium ions and electrons, it became possible to use measurements of their 
deflection to find the particles’ energies. These energies were enormous, 
typically about a million times larger than the energies of photons emitted 
in ordinary chemical reactions such as burning. Studies of radioactivity by 
Rutherford with the chemist Frederick Soddy (1877-1956) at McGill Uni- 
versity showed that this energy is released when elements like radium and 
thorium spontaneously change to other elements, such as radon. But, for the 
understanding of atoms, the most important consequence of the discovery of 
radioactivity was that it provided highly energetic charged particles that could 
be used as probes of atomic structure. 


Discovery of the Atomic Nucleus 


After Thomson’s discovery of the electron, it was widely supposed that atoms 
are like puddings, in which negatively charged electrons swim like raisins in 
a smooth background of positive charge. This seemed at first to be verified by 
experiments at the laboratory of Ernest Rutherford at the University of Manch- 
ester, to which Rutherford had moved in 1907. Rutherford’s assistant Hans 
Geiger (1882-1945) used a beam of alpha particles from what he called “radium 
emanation” (radon 222, a product of the alpha decay of radium 226), collimated 
by letting the alpha particles pass through a small slit in a metal sheet through 
which the alpha particles emerged in a narrow beam. The beam was directed 
at a gold foil, thin enough that the alpha particles could penetrate the foil. 
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The beam then struck a screen covered with zinc sulfide, which emits a flash 
of light when hit by an energetic charged particle such as an alpha particle. If 
the gold atom really consisted only of the very light electrons in a continuum of 
positive charge, it would scatter the alpha particles only weakly. Geiger at first 
found flashes of light from an area only slightly larger than the geometric image 
of the slit, where the unscattered beam would have struck the screen, indicating 
the expected slight scattering.® 

A better model was suggested in 1911 by Rutherford,’ on the basis of further 
experiments in 1910 in his laboratory. Geiger and Ernst Marsden (1889-1970) 
again used alpha rays emitted from a glass tube containing radon 222 gas.!0 
Rutherford for some reason asked Geiger and Marsden to see whether any alpha 
particles could be deflected at large angles, more than 90°, so that the particles 
would be reflected backwards from surfaces of gold or other metals, producing 
flashes of light in a zinc sulfide screen on the same side of the metal surface 
as the alpha particle source. To his surprise Rutherford learned that some alpha 
particles were scattered almost straight back from various metal surfaces: gold, 
lead, platinum, etc. _ 


Nuclear Mass 


The observation of backward scattering immediately indicated that the alpha 
particles were repelled by something much heavier than an electron, heavier 
indeed than an alpha particle. Suppose two particles with masses m,4 and mg 
and initial velocities v4 and vg along some line collide head on, emerging with 
velocities v’, and v’, along the same line. (These vs can be positive or negative; 
when two vs have the same sign the particles are going in the same direction; 
if opposite signs, they are going in opposite directions.) The conservation of 
momentum requires that 


/ / 
MAV, +MBVRZ = MAvVA + MBvpB 


while (as long as velocities are measured when the particles are sufficiently 
far apart that they exert no force on each other) the conservation of energy 
requires that 


8 a. Geiger, Proc. Roy. Soc. A 81, 141 (1908). This reference gives citations to earlier work of Rutherford 
and others along the same lines. 

9 E. Rutherford, Phil. Mag. 21, 669 (1911); 27, 488 (1914). The first article is reprinted in Beyer, 

Foundations of Nuclear Physics, listed in the bibliography. 

It is not clear whether these alpha particles were produced by the direct alpha decay of radon or in 

the alpha decay of radon’s decay products. Without explanation Rutherford’s 1911 paper cited an alpha 

particle velocity of 2.09 x 10° cm/sec. If this is accurate, then these alpha particles could not have been 

those that are emitted in the decay of radon 222 to polonium 218, which have a velocity of 1.6 x 10° 

cm/sec. Polonium 218 decays into lead 214 with a half life of 3.1 minutes, producing an alpha particle 

with velocity 1.7 x 10° cm/sec, and lead 214 then undergoes further decays. Rutherford’s estimate of an 

alpha particle velocity of 2.09 x 10° cm/sec may have been just a guess. 

1 H. Geiger and E. Marsden, Proc. Roy. Soc. A 82, 495 (1910). 


10 


74 3 Early Quantum Theory 


mavi, + mpv'y- = MAvA- + mpup- ‘ 

We can use the first equation to express vu’, in terms of the other velocities. 
Using this in the second equation, we then have a quadratic equation for v’,, 
with coefficients depending on vg and vz. Like any quadratic equation, this has 
two solutions. If nothing changes in the collision then the conservation laws are 
automatically satisfied, so one solution is obvious; without even writing down 
the equation, we know that UE = UB, v4 = vg is a Solution. Since there are 
only two solutions, the other solution, for which something does happen in the 
collision, is unique. Here it is: 

(ma —mB)vA + 2mBvB 


v= 3.3.1 
a ma +mg ( ) 


and, just interchanging the As and Bs, 


hee (mg —ma)vg t+ 2mava 
= 
ma +meg. 


It is easy to check that these do satisfy the conservation laws. 
In particular, suppose that particle B is initially at rest, so vg = 0. Then 


i (mame) om 
ma +mg 
So it is only possible for particle A to be reflected backward (that is, with the 
sign of v’, opposite to that of v4) in the collision with a particle B at rest if 
ma — mg is negative. Taking A to be the alpha particle, B to be whatever it is 
in the atom encountered by the alpha particle, we see that whatever the forces 


between them may be, the alpha particle must be repelled by something in the 
atom heavier than itself. 


Nuclear Size 


The observations of alpha particles reflected backward also shows that they 
are repelled by something small. Here it is necessary to assume that at the 
separations reached in these collisions, the force between the alpha particle 
and whatever it is encountering is purely electrostatic. If we assume that the 
alpha particle has charge e, and mass my, and is repelled by something heavy 
with charge Ze, then the potential energy of the alpha particle at separation r 
is Zeey/r. In order for the alpha particle to be brought momentarily to rest 
before reversing direction, its initial kinetic energy my we /2 must be entirely 
converted to potential energy, so it must at that moment reach a separation r 
satisfying 


Zeey/r = myv2/2 
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or, in other words, 
r = 2Ze(eg/ma)v2 . (3.3.2) 


Both vg and eg /ma could be measured, as Thomson had done for electrons, by 
measuring the electric and magnetic deflection of the beam of alpha particles. 
Rutherford cited a velocity vg ~ 2.09 x 10° cm/sec, and, as already mentioned, 
€y/My  e/2m, = 1/2 faraday. Using these values in Eq. (3.3.2) gives the 
separation when the alpha particle comes to rest as r = 3Z x 107!4 cm. Even 
for Z as large as 100, this is much smaller than the diameter >10~° cm of 
heavy atoms, estimated from the density of the metals and the mass Am, of 
their atoms. 

Rutherford jumped to the conclusion that the positive charge of any atom and 
most of its mass is concentrated in a small heavy nucleus, which was repelling 
the alpha particles in his experiment. Whether or not by chance, Rutherford 
announced this discovery at a session of the Manchester Literary and Philosoph- 
ical Society, the same organization at whose meeting Dalton had announced the 
law of combining weights a little more than a century earlier. 


Scattering Pattern 


Further experiments in Rutherford’s laboratory measured the rate dP’ at which 
alpha particles in a beam with flux ® (in particles per unit time and per unit 
area transverse to the beam) are scattered into any solid angle dQ (that is, 
into ranges d@ of angles to the initial direction and d¢ of angles around the 
initial direction, with dQ = sin@ d@ dq). Rutherford compared the result with 
a calculation using Newtonian dynamics to follow the hyperbolic orbits of alpha 
particles in the electric field of a single charged nucleus and find into what area 
do transverse to the beam the alpha particles must be directed in order to be 
scattered into the solid angle dQ. This gave the ratio of do to dQ, known as the 
differential cross section: 

do es Ze 

dQ 16E2 sin*(6/2) ” 
where Z,e and Ze are the electric charges respectively of the alpha particle and 
the nucleus, and Ey is the initial alpha particle kinetic energy. Since any given 
alpha particle can be anywhere in the beam, for a beam of transverse area A 
the probability that a particular alpha particle will be aimed at the area do for 
scattering into dQ by a single nucleus is do /A. If there are N atomic nuclei in 
the part of a metal surface within the area of the beam of alpha particles, then 
the probability that a given alpha particle will be scattered into the solid angle 
dQ will be Ndo/A. With a flux ®, the number of alpha particles per second 
hitting the metal surface is ®A, so the rate at which alpha particles are scattered 
into the solid angle dQ is 


(3.3.3) 
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do 
dQ 
The observed pattern of scattering at angles greater than 90° agreed with the 
proportionality to 1/ sin*(@/2) indicated by Eq. (3.3.3), confirming to Ruther- 
ford that this was indeed Coulomb scattering by a heavy point charge. 

Rutherford was lucky. He was calculating these probabilities using classical 
mechanics and got the right answer, even though at these velocities and 
separations quantum mechanics would normally be needed. Scattering by 
inverse square law forces is special; it allows the use of classical mechanics 
in some circumstances where for any other force it would be necessary to 
use quantum mechanics. Equation (3.3.3) will be derived using quantum 
mechanics in Section 5.6, so we will not trouble to repeat Rutherford’s classical 
calculation here. 

Of course, Rutherford’s discovery was made before the development of quan- 
tum mechanics. The agreement of his experimental results with theory generally 
convinced physicists of a new picture of the atom, that it consists of a small 
heavy positively charged nucleus, around which electrons revolve like plan- 
ets around the Sun, held in orbit by electrostatic attraction, which in part had 
already been guessed at in 1904 by Hantaro Nagaoka (1854-1950). 


®A x Ndo/A = ®PN—dQ . 


Nuclear Charge 


In order for atomic theory to make contact with chemistry, it was essential to 
know the precise number of electrons in the atoms of various elements. For 
instance, as we shall see in Chapter 5, the dramatic difference in the chemical 
properties of chlorine and argon is almost entirely due to the fact that chlorine 
atoms contain 17 electrons while argon atoms contain 18. Because atoms are 
electrically neutral, knowing the electric charge of the nucleus tells us the num- 
ber of electrons: if the nuclear charge is Ze, the atom must contain Z electrons. 

Almost immediately after Rutherford in 1911 announced his conclusion 
about the existence of the nucleus, Antonius van den Broek (1870-1926) 
argued in a brief note! (apparently on the basis of the steady progression of 
chemical properties with increasing atomic weight) that the nuclear charge in 
units of e equals the atomic number, defined as the position in the catalog of 
elements when they are listed in order of increasing atomic weight — that is, 
hydrogen, helium, lithium, and so on — but he had no experimental evidence for 
this hypothesis. 

Rutherford offered no opinion about this. In his 1911 article cited in 
footnote 9 he had used Eq. (3.3.3) (with Z,_ =2, known from previous mea- 
surements of the deflection of alpha particles by electric and magnetic fields) 
together with several measurements by Geiger of the scattering by small angles 


12 a. van den Broek, Nature 87, 78 (1911). He later published a longer paper, Phys. Zeit. 14, 32 (1913). 
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of alpha particles in thin gold foil to derive a value of 97e or 114e for the charge 
of the gold nucleus. The atomic number of gold is 79, so if Rutherford’s value 
for the charge of the gold nucleus had been correct it would have ruled out the 
equality of atomic number and the nuclear charge in units of e.!3 As we shall 
see in the next section, this equality was established in 1913 by measurements 
of the wavelengths of X-rays from various elements. 


3.4 Atomic Energy Levels 


Spectral Lines 


In Munich in 1814—1815 the optician Joseph Fraunhofer (1787-1826) observed 
that when light from the Sun is passed through a slit, focussed by a telescope, 
and then dispersed by a prism into a spectrum of colors, the spectrum is crossed 
with hundreds of dark lines, each an image of the slit. These lines were always 
found in the same places in the spectrum, each corresponding to a definite 
wavelength of light. It was realized that these dark lines must be caused by 
selective absorption of light as it passes from the hot solar surface through 
the cooler part of the Sun’s atmosphere. The same dark lines were seen in 
the spectrum of the Moon and bright stars. Similar observations of the light 
from flames and other terrestrial sources showed lines in the same places, some- 
times dark and sometimes bright, so it became possible to identify the elements 
producing these lines: sodium, iron, magnesium, calcium, etc. Some elements, 
such as helium, were discovered in this way on the Sun before they were found 
on Earth. 

By the end of the nineteenth century large books had been published for 
physicists and chemists, giving vast numbers of wavelengths for the spectral 
lines of various elements. The observation of spectra became a standard tool of 
astronomy and chemical analysis. But what could cause the atoms of a given 
element preferentially to emit and absorb light at only certain definite wave- 
lengths? Answering this question had to wait for a realistic model of atoms. 


Electron Orbits 


In classical electrodynamics the simple harmonic oscillation of a charged body 
produces electromagnetic radiation with the same frequency as the oscillating 


'3 Rutherford’s over-estimate of the charge of the gold nucleus may have arisen because he was using a 
wrong value for the velocity of the alpha particle in these experiments. As mentioned in footnote 10, in 
the same paper Rutherford had given a value 2.09 x 10° cm/sec for the alpha particle velocity, while the 
alpha particles from the decay of radon 222 actually have a velocity of 1.6 x 10° cm/sec. According to 
Eq. (3.3.3) the scattering cross section depends on Z/ Eq, so by over-estimating the velocity of the alpha 
particles he would be over-estimating the electric charge of the gold nucleus. 
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charge, and the charged body is also effective at absorbing radiation at that 
frequency. After the discovery of the electron in 1897, as mentioned earlier it 
was widely supposed that atoms consist of electrons trapped in a smooth back- 
ground of positive charge, and it was natural to assume that the characteristic 
frequencies observed in atomic spectra are the frequencies with which these 
electrons can oscillate back and forth around their normal positions. 

Then, with the discovery of the nucleus discussed in the previous section, 
this picture was replaced with a planetary model of the atom, in which electrons 
circulate in orbits around the nucleus, like planets around the Sun only held 
in orbit by electrostatic rather than gravitational attraction. In classical elec- 
trodynamics the periodic motion of the electrically charged electrons would 
produce electromagnetic radiation, with a frequency for circular orbits equal 
to the frequency with which the electron goes around its orbit. 

For elliptical orbits matters are more complicated. While the Cartesian coor- 
dinates of an electron traveling at constant speed in a circular orbit are simple 
harmonic functions of time, and in classical electrodynamics the electron radi- 
ates at the corresponding frequency, for elliptical orbits the motion is periodic 
though not simple harmonic. The Cartesian coordinates for an orbit of period 
1/v can still be expressed as Fourier series of simple harmonic terms propor- 
tional to sin2znvt and cos 27rnvt with n an integer, so the electron classically 
radiates at all frequencies equal to whole number multiples of the frequency v 
of revolution. No such pattern is seen in actual spectra. 

Even if the orbits were all circular, this view of atomic spectra would have 
problems. One trouble with this picture is that classically the electrons would 
continually lose energy to radiation, bringing them closer to the nucleus and 
thereby speeding up its revolution, hence replacing the discrete spectral line 
with a continuum of frequencies. Even worse, classically there would be nothing 
to prevent electrons from spiraling onto the nucleus, so that there would be no 
stable atoms. Of course, one could simply assume that only certain orbits are 
possible, and that these are all stable. The frequencies of these allowed orbits 
would then correspond to the observed spectral lines. But there was another 
trouble even with this picture: it offered no explanation of a systematic property 
of observed spectral frequencies, known as the Ritz combination principle. 


The Combination Principle 


In 1908 the spectroscopist Walther Ritz (1878-1909) noticed a peculiar property 
of the observed wavelengths of spectral lines: '* in any one atom, the frequencies 
corresponding to the observed wavelengths of spectral lines are differences of a 
smaller number of quantities, which he called terms. That is, if we label the nth 
term as v,, then the observed spectral frequencies are all of the form 


14 W. Ritz, Phys. Z. 9, 521 (1908). 
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Vam = Vn — Vin » (3.4.1) 


with n and m equal to 1,2,3,... (This was traditionally expressed in terms of 
inverse wavelengths instead of frequencies, but the frequency of any wave is just 
the speed of light times the inverse wavelength, so this makes no difference.) 
Ritz could offer no explanation of this principle. 

The explanation of the Ritz principle and much else was provided in the visit 
in 1913 to Rutherford’s Manchester laboratory of a young Danish theorist, Niels 
Bohr!> (1885-1962). He assumed that the states of an atom have energies in a 
discrete set, labeled E, with n running from one to infinity. These states are 
stable, except for radiative transitions among them, whose rates are typically 
much slower than the frequencies of spectral lines. When an atom makes a 
transition from an energy E, to a smaller energy E,,, it emits a photon with 
energy EF, — E,, and hence with frequency 


Vam = (En _ Em)/h : 


Similarly, for an atom to make a transition from an energy E,, to a higher 
energy E,,, it must absorb a photon with the same energy and frequency. These 
are the transitions that produce the bright and dark lines observed in spectro- 
graphs. Their frequencies match the results (3.4.1) given by the Ritz principle, 
if we identify the “terms” v, as simply the energies F,, of the various states, 
divided by h. 


Bohr’s Quantization Condition 


But what determines the energies E,,? Casting about for something to quantize, 
Bohr noted that / has the units of energy per frequency, which is the same as 
the units of angular momentum, so Bohr guessed that the angular momenta of 
atomic states are integer multiples of some quantity /, similar in magnitude 
to h. (Some readers may already know what fi turned out to be. At first, Bohr 
had no idea what it was, so until we see how Bohr figured this out, please forget 
whatever you know about fi.) 

The applications of Bohr’s quantization principle are simplest for one-electron 
atoms, such as neutral hydrogen, singly ionized helium, etc. An electron with 
velocity v, in a circular orbit of radius r, about a nucleus of charge Ze has 
angular momentum meUnrn, SO Bohr’s quantization condition was 


MeVyln = Nh , (3.4.2) 


with n an integer running from one to infinity. A second relation between v, 
and r, is given by equating the electrostatic attraction Ze? / re to me times the 
centripetal acceleration ue ae 


Ze" /r2 = mev>/Th (3.4.3) 


15 N. Bohr, Phil. Mag. 26, 1, 476, 857 (1913); Nature 92, 231 (1913). 


80 3 Early Quantum Theory 


just as for planets in the solar system, but of course with different constant 
factors on each side of the equation. We can solve these two equations for radius 
and velocity. Multiplying Eq. (3.4.3) with rile gives ry, = meer (Ze. so 
as (3.4.4) 
n= ; A, 
"" Ze2m, 


Using this back in Eq. (3.4.2) then gives 


Ze? 
Un = —. (3.4.5) 
nh 
The electron has total energy 
E.= mev> Ze _ Ze4m, (3.4.6) 
~— lr nh? _ 


(By the way, it immediately follows from Eq. (3.4.3) that the kinetic energy is 
—1/2 times the potential energy. One consequence, already mentioned in the 
previous section, is that classically when an electron in orbit loses energy the 
potential energy decreases, becoming more negative, so that the kinetic energy 
increases.) 


The Correspondence Principle 


Now, what is 4? To answer this, Bohr invoked what he called the correspon- 
dence principle, that the larger a system is, the more closely it obeys classical 
mechanics. From Eq. (3.4.4) we see that the large orbits are those with large n. 
(Atoms with n of order 100 have actually been studied experimentally.) For 
n > 1, the energy emitted when a single-electron atom goes from state n to 
state n — 1 is 


ei Ae Ze4m, ( 1 =) = Ze4m, . 2 
2 (n— 1)? n2 on n>? 
so the frequency of the photon emitted in this transition must be 
Ze+me 
ver Ph 


On the other hand, classically the frequency with which the electron goes around 
its orbit is 

Un Ze*/nh _ Ze*m, 

— Qnrn — 2nn2h?/Ze2me = 2nn3h> 

In order for these two frequencies to be equal, as required by the correspondence 
principle, we must have f*h = 27h’, and therefore 


h =h/2n ~ 1.054 x 107°’ erg sec ~ 6.582 x 107!’ eV sec. (3.4.7) 


Vn 
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Bohr could then give numerical values of parameters for one-electron atoms: 


Z n? = Zz 
XC, Tn & => X 0.5292 x 10°-°cm, E, =—-—> X 13.6eV. 
137n Z n2 


(3.4.8) 


Va & 


Comparison with Observed Spectra 


Bohr’s result for E, was in good agreement with measurements of spectral 
wavelengths. In 1885 the Swiss mathematician Johann Balmer (1825-1898) 
had noticed that the wavelengths of many of the lines in the visible spectrum 
of hydrogen are well fit by the formula 


- 14 
Maier Xx 4 = ov) » A= 3,4, i aor 


This was generalized in 1888 by the Swedish physicist Johannes Rydberg 
(1854-1919), to a general formula for the wavelengths of lines in the spectrum 
of neutral hydrogen: 


1 1 
vin = Ru (“3 =) », m=1,2,3,..., n=m+I1,m42,..., 


where Ry ~ 1.1 x 10° cm™! is a constant, later named the Rydberg constant 
for hydrogen. The visible Balmer series is the case m = 2, while the infrared 
series m = 3, m = 4,m = 5, etc. became named for Paschen, Brackett, Pfund, 
etc. The lines of the m = 1 series were predicted by Rydberg’s formula to 
be in the ultraviolet, with wavelengths from 121.7 to 91.1 nm. It was not until 
1903 that they were measured, studying hydrogen excited by electric currents, 
by Theodore Lyman (1833-1897) at Harvard. These results for hydrogen may 
have provided Ritz with inspiration to formulate his combination principle for 
all elements. 

For comparison with Rydberg’s formula, Bohr’s formula (3.4.6) for E, gave 
the inverse wavelength of the photon emitted in a transition from energy level n 
to energy level m: in an atom whose nucleus has charge Ze, 


1 Ze 1 1 
(1 op ae (= = =) , (649) 


woter é 2mhe Anhic 
which is the same for Z = 1 as Rydberg’s formula if we identify the Rydberg 
constant for hydrogen as 


e*me 


~ Arhhc 
Using the best values then available for the fundamental constants, Bohr 
obtained a value for Ry in agreement with the results from contemporary 


Ry (3.4.10) 
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spectroscopic measurements. (Using modern values for fundamental constants 
gives Ry = 13.605693009(84) eV/hc = 1.0968 x 10° cm ~!.) 


Reduced Mass 


But the agreement was not perfect. According to Eq. (3.4.6), all energies and 
hence all frequencies in the spectrum of once-ionized helium should be ie =4 
times larger than for neutral hydrogen, but experiment showed that the ratio 
was actually larger than 4 by about 0.04%. Bohr realized that the source of this 
discrepancy was that in order to take account of the motion of the nucleus the 
formulas for energy and angular momentum of an electron in orbit around a 
nucleus of mass M should contain the reduced mass fs = me/(1 + me/M) 
in place of the electron mass itself. It is therefore the reduced mass that should 
appear in Bohr’s formulas for energies and frequencies in place of mz,. All ener- 
gies and frequencies are thus larger for singly ionized helium than for hydrogen 
by a factor 


1+m.-/my 


= 4x 1.00041 , 
1+ m_-/mye 


Zite/ Zi X MHe/ MH = 4 Xx 
in agreement with observation. Bohr’s success in getting this factor right was a 
key factor in convincing physicists of the correctness of his assumptions. 
Incidentally, although Bohr’s formula (3.4.10) for hydrogen energy levels 
(with the reduced mass in place of m,.) worked very well, the n in this formula 
is not quite equal to the angular momentum in units of #, as Bohr had assumed. 
We will see in Section 5.2 that in general there are several hydrogen states with 
energies given by this formula with the same n, in which the electron has orbital 
angular momenta (n — 1)A, (n —2)h,...,0, but not nh. The electrostatic attrac- 
tion exerted on electrons by the nucleus is not balanced solely by the centrifugal 
force of motion in closed orbits, but by motions implicit in the wave nature of 
the electron. Although Bohr’s calculation of the energy levels in hydrogen has 
not survived as a correct derivation of the formula for these energies, Bohr made 
a contribution of permanent importance in using a hypothesis of discrete energy 
levels for electrons in all atoms to explain the existence of bright and dark lines 
in atomic spectra. 


Atomic Number 


The alpha particle scattering experiments in Rutherford’s laboratory had not 
settled the crucial question of the electric charge Ze of the atomic nucleus and 
its possible relation to the atomic number, which gives the order of an element 
in the list of elements in order of increasing atomic weight. One of the great 
achievements of the Bohr theory is that it made possible precise measurements 
of nuclear charge. 
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Of course, Bohr’s formula (3.4.8) was strictly applicable only to one-electron 
atoms, but under the approximation of spherical symmetry the electric field felt 
by the innermost electrons in any atom arises entirely from the nucleus, not 
from electrons farther out. Hence the energy of the photon emitted when an 
electron falls from a state in which it is more or less at rest far from the atom 
and has essentially zero energy to the innermost n = 1 orbit of any atom is 
given by Bohr’s formula (3.4.8) as —E, = 13.6Z7 eV. For Z > 10 this is an 
X-ray energy. 

After publication of Bohr’s work in 1913, a young physicist at Manchester, 
Henry G. J. Moseley (1887-1915), set out to measure these energies. Instead of 
a prism he used a crystal that (as described in Section 5.1) preferentially reflects 
X-rays at certain angles that depend on the wavelength. His results!® for the 
nuclear charge Ze are shown in the following table, along with the values then 
known for atomic weight A, which are close to the values accepted now: 


Element Z A 
calcium 20.00 40.09 
scandium — 44,1 
titanium 21.99 48.1 
vanadium 22.96 51.06 
chromium 23.98 52.0 
manganese 24.99 54.93 
iron 25.99 55.85 
cobalt 27.00 58.97 
nickel 28.04 58.68 
copper 29.01 63.57 
zinc 30.01 65.37 


Two aspects of this table stand out dramatically. The first is that Z always 
turns out to be very close to an integer; the small discrepancies can be easily 
blamed on experimental uncertainties. That of course is what one expects, if Z 
is the number of electrons in the atom, but it reassured everyone that Moseley’s 
measurements were reliable. The second remarkable feature is that Z goes up 
by one unit as you go up one step in the list of elements according to atomic 
weight; there are no elements with atomic weights between 40 and 65 other 
than those listed here. (Nickel is an exception to the steady increase of A with 
Z, understood today as due to forces in the nucleus of nickel that make it 
unusually strongly bound, for a reason discussed in Section 6.3.) This tight 


16 tH. G.J. Moseley, Phil. Mag. 26, 1024 (1913). 
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correspondence between atomic number and atomic weight goes beyond the 
elements in the table. For instance, there are just 19 elements with atomic 
weights less than calcium, which has Z = 20. Thus with a few exceptions, 
one can find Z for any element just by making a list of all elements in order 
of increasing atomic weight; the atomic number, defined as the place of the 
element in that list, gives the number Z of electrons in the atom and the positive 
charge Ze of the nucleus. 

Incidentally, the Bohr theory also provides a rough idea of the sizes of all 
atoms. The electric field felt by the outermost electron in any atom is largely 
shielded by the Z — 1 electrons closer to the nucleus, so the radius of its orbit 
is very crudely given by the Bohr result (3.4.8), only with Z ~ 1. This is why 
the sizes of the atoms of heavy elements are not very much larger than that 
of the hydrogen atom, of order 10~* cm. They are in fact somewhat larger, 
because the radius 7, increases with n, and for reasons we will learn in Chapter 5 
the outermost electrons in heavy atoms have n greater than 1. 


Outstanding Questions 
Successful as it was, the Bohr theory raised a number of new questions. 


1. Why should angular momentum (or anything else) be quantized? 

2. How many atomic states are there for each energy? (It was already known 
that spectral lines could be split by exposing atoms to external electric and 
magnetic fields.) 

3. Above all, how should quantum theory be applied to states that cannot be ap- 
proximated as consisting of electrons moving in a fixed Coulomb potential. 
This includes all molecules. 


The solution of these problems had to wait until the advent of modern quantum 
mechanics in the 1920s. This is the subject of Chapter 5. 


3.5 Emission and Absorption of Radiation 


A and B Coefficients 


In 1917 Einstein returned to the theory of black body radiation,!’ this time 
combining it with the Bohr idea of quantized atomic energy states. Einstein 
defined a quantity A”, as the rate at which an atom will spontaneously make a 
transition from a state m of energy E,, to a state n of lower energy E,,, emitting 
a photon of energy E,, — Ey. He also considered the absorption of photons 


7 A. Einstein, Phys. Z. 18, 121 (1917), reprinted in English translation in Van der Waerden, Sources of 
Quantum Mechanics, listed in the bibliography. 
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from radiation (not necessarily black body radiation) with an energy density 
E(v) dv at frequencies between v and v + dv. The rate at which an individual 
atom in such a field makes a transition from a state n to a state m of higher 
energy is written as BY’E(vnm), where Vym = (Em — En)/h is the frequency 
of the absorbed photon. As we will see, Einstein also found it necessary to take 
into account the possibility that the radiation would stimulate the emission of 
photons of frequency v,, by the atom in transitions from a state m to a state 
n of lower energy, at a rate written as BY E(Vpm). The coefficients Bi”, and Br, 
like A‘, were assumed to depend only on the properties of individual atoms, 
not on their temperature or any properties of the radiation. 

Now, suppose the radiation is black body radiation, at a temperature 7, with 
which the atoms are in equilibrium. The energy density per frequency interval 
of the radiation will be the function €(v, T) given by Eq. (3.2.4): 


8th py? 


NI)" 2 eepthueTy= 1" 


In equilibrium the rate at which atoms make a transition m — n from higher 
to lower energy must equal the rate at which atoms make the reverse transition 
n> m: 


Now [A BEE Onms T)| = Na BCE Wim, TY s GSA) 


1 


where N,, and N,,, are the numbers of atoms in states n and m. According to the 
Boltzmann rule of classical statistical mechanics, at temperature T the number 
of atoms in a given state of energy EF is proportional to exp(— E/kT), so 


Nin/Nn = exp (—(Em — En)/kT) = exp (—hvym/kT) . (3.532) 


(It is important here to take the various N, as the numbers of atoms in the 
individual states n, some of which may have precisely the same energy, rather 
than the numbers of atoms in all states with energies E’,.) Putting this together, 
we have 


| eg Vim hvnm/kT) B™ — BY 3.53 
™ 3 exp(hvam/kT) — 7 (expt Ynm/ kT) By — mn) (3.5.3) 
nm 


For this to be possible at all temperatures for temperature-independent A and B 
coefficients, these coefficients must evidently be related by 


8rhv3 
By = Be, Ay = (San) wy G54) 
Hence, knowing the rate at which a classical light wave of a given energy density 
is absorbed or stimulates emission by an atom, we can calculate the rate at which 
it spontaneously emits photons, an explicitly quantum process. 
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Lasers 


The phenomenon of stimulated emission makes possible the amplification of 
beams of light in a laser. (This is an acronym for “light amplification by stim- 
ulated emission of radiation.” Before lasers there were masers, in which it was 
microwave radiation rather than visible light that was amplified by stimulated 
emission.) Suppose a beam of light with energy density distribution €(v) passes 
through a medium consisting of NV, atoms at energy level E,,. Stimulated emis- 
sion from the first excited state n = 2 to the ground state n = 1 adds photons 
of frequency v12 = (E2 — E,)/h to the beam at a rate No€ (142) Bi, but absorp- 
tion from the ground state removes photons at a rate N,)€ (v2) BY, and since 
Bs = B; there will be a net addition of photons only in the case Nz > Nj. 
Unfortunately, such a population inversion never occurs in thermal equilibrium, 
and cannot even be produced by exposing the atoms in their ground state to light 
at the resonant frequency vj. The net rate of change in the population of the 
first excited state, labeled n = 2, due to spontaneous and stimulated emission 
from the excited state and absorption from the ground state will be 
dN2 


<= —Ny€(v12) Bi — No A} + NiE(v12)B? 


or, using the Einstein relation B= B 


dN. 
— = BS — No[EQi2) + 82 v75,h/7] + ME)] - (3.5.5) 


If we start with Nz = 0, then N2 increases until it approaches a value Nj /(1+6), 
where € = 8z vp Ah/E(vy2)c°, when N2 becomes constant. Not only can this 
process not produce a population inversion; because of spontaneous emission it 
cannot even make N2 as large as N}1. 

A population inversion can be produced in other ways, for instance by optical 
pumping, in which atoms are excited to some state, say n = 3, by absorption 
of light with frequency v3; = (£3 — E,)/h, and then spontaneously decay to 
the state n = 2. This can also happen naturally. Masers have been observed in 
the accretion disks surrounding the centers of several galaxies, including NGC 
4258 and M33. 


Suppressed Absorption 


Stimulated emission can not only intensify emission lines, such as those from 
masers — it can also suppress absorption lines. Consider a steady beam with 
area A of radiation moving in the +.x-direction, with local energy density per 
unit frequency interval €(v,x) at x. In the steady state, the rate of change of 
energy per unit frequency interval € A dx in the slab between x and x + dx due 
to atomic transitions n — m andm — n with E,, — E, = hv > O must 
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be balanced by the difference in the rates at which radiation energy enters and 
leaves the slab: 


cl[E(v, x) — Ev, x + dx)JA = hvE(v, x)[—nn BP + nm BEA dx 


where n,, and n, are the number densities of atoms in states m and n, respec- 
tively. The two terms in square brackets on the right arise respectively from 
absorption and stimulated emission; we do not include a term for spontaneous 
emission because the photons it produces leave the beam. If the medium is in 
thermal equilibrium at temperature T then n/n, = exp(—hv/kT); so, since 
By = Bj", the energy density per unit frequency interval along the beam must 
satisfy 


ae es = ey, x), BU — exp(—hv/kT)] . (3.5.6) 
dx Cc 


Thus, if hv < kT, stimulated emission suppresses the intensity of the absorp- 
tion line by a factor hv/kT. This is important for radio and microwave fre- 
quency lines, like the famous “21-cm” line in hydrogen discussed in Section 5.4. 
It has hv/k = 0.068 K, which is less even than the temperature of the cosmic 
microwave background, so this absorption line is strongly suppressed by stim- 
ulated emission everywhere. Nevertheless, the absorption line is observed. Its 
intensity and Doppler shifts provide valuable information about the temperature 
and motion of hydrogen gas in galactic disks. 


4 
Relativity 


We now turn to the special theory of relativity, introduced by Einstein in a pair 
of papers in 1905, the same year in which he postulated the quantization of 
radiation energy and showed how to use observations of diffusion to measure 
constants of microscopic physics. Special relativity revolutionized our ideas of 
space, time, and mass, and it gave the physicists of the twentieth century a 
paradigm for the incorporation of conditions of invariance into the fundamental 
principles of physics. 


4.1 Early Relativity 


Motion of the Earth 


The idea of the relativity of motion first appeared in medieval arguments over 
whether or not the Earth can be in motion. For no good reason, it had been 
proposed by the followers of the cult of Pythagoras in the fifth century BC that 
the Earth along with the Sun and planets was in orbit about some sort of central 
fire. A more sober proposal was made in the third century BC by the Hellenistic 
astronomer Aristarchus of Samos (ca. 310-230 BC).! From observations of the 
Sun and Moon, he calculated that the Sun is much larger than the Earth. Accord- 
ing to a later book of Archimedes, Aristarchus concluded from the difference in 
their sizes that instead of the Sun going around the Earth it was more plausible 
to suppose that the Earth goes around the Sun. 

Better motivated was the idea that the Earth is rotating. It was not hard to 
see that the apparent rotation once a day from east to west of the Sun, Moon, 
planets, and stars could be neatly explained if instead the Earth were rotating on 
an axis once a day from west to east. At least one astronomer suggested this as 


1 Aristarchus, “On the Sizes and Distances of the Sun and Moon,” translated by T. L. Heath, in Aristarchus 
of Samos (Clarendon Press, Oxford, 1923). The calculations of Aristarchus are described in S. Weinberg, 
To Explain the World (HarperCollins Publishers, New York, 2015). 
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early as the fourth century BC; it was Heraclides of Pontus (ca. 388-310 BC), 
a student at Plato’s Academy at Athens. 

There is a classic argument against both the rotation and motion of the Earth, 
given originally by Aristotle, and picked up around 150 AD by the astronomer 
Claudius Ptolemy of Alexandria (ca. 100-170 AD). Ptolemy argued that if the 
surface of the Earth were in motion then an arrow shot straight up would not 
fall back to the same spot from which it was shot, as is observed, because 
while the arrow was in flight that spot would have moved some distance under 
the arrow. This argument was first countered in the mid-1300s AD by Nicole 
Oresme (1321-1382), bishop of Lisieux. Relying on the concept of impetus 
introduced by his teacher at the University of Paris Jean Buridan (1300-1358), 
Oresme argued that an arrow on the surface of the Earth would pick up an 
impetus from the Earth’s motion, which would keep it moving with the same 
horizontal component of velocity while going up and down in the air, so it 
would fall back to the same spot on Earth, despite the Earth’s motion. Sadly, 
whether from respect for the teachings of the Church or fear of its discipline, 
Oresme never publicly adopted the notion that the Earth really is in motion. But 
he had established that purely terrestrial observations cannot detect a possible 
motion of the Earth. 

It was not so obvious that the peculiar motion of the planets around the 
constellations of the zodiac, sometimes even seeming to reverse their motion, 
could be explained if the Earth were in orbit about the Sun, sometimes passing 
Mars or some other outer planet, and sometimes being passed by Venus or Mer- 
cury. As everyone knows, this was finally made clear in the 1540s by Nicolaus 
Copernicus (1473-1543). 


Relativity of Motion 


I don’t know if it was the writings of Oresme or similar ideas of their own, but 
Johannes Kepler (1571-1630) and Galileo Galilei (1564-1642) in their defense 
of Copernicanism were comfortable with the conclusion that there is no way 
that a uniformly moving observer without observing the surroundings can tell 
that he or she is in motion. It was generally understood that (in modern notation) 
if a first observer describes any event as having Cartesian space coordinates x! 
(with i = 1,2,3 or x, y, z) and time coordinate t, then a second observer who 
moves with velocity —u with respect to the first will see the same event with 
coordinates 


x =x 4+u't, tf =t, (4.1.1) 
because an object seen by the first observer with any time-independent coordi- 
nates x’ = a' will seem to the second observer to be moving with velocity + u, 
with coordinates x" = a! +u't. 
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Invariance under these transformations was built into Newton’s theory of 
motion and gravitation. In a system of bodies acted on by their mutual gravita- 
tional attraction, the equations of motion obeyed by the Cartesian coordinates 
xy of the Nth body are 


xy = Yo Gmy 4, (4.1.2) 


where G is Newton’s gravitational constant, and |x — x ni? =) Gi _ xy: 
These equations are invariant under the transformation (4.1.1), which here is 


woxy =x tult, tort, (4.1.3) 


because the term u't drops out in the second time derivative on the left-hand side 
of Eq. (4.1.2) and does not appear in the differences of spatial coordinates on 
the right-hand side. The principle that the laws of nature are invariant under 
the transformations (4.1.1) is known as the principle of Galilean relativity. 
It is a good approximation for bodies moving at speeds much less than that 
of light. For instance, we saw in Section 2.5 how invariance under Galilean 
transformations is used to infer the equations of motion for imperfect fluids. 

The equations of motion (4.1.2) are of course also invariant under constant 
rotations of space coordinates and constant translations of space and time coor- 
dinates. The set of all these transformations and all their combinations is known 
as the Galileo group. 


Speed of Light 


It is obvious that Maxwell’s equations are not invariant under the Galilean 
transformations (4.1.1). Maxwell’s equations tell us that light always travels 
at the same speed, which we call c. If a light wave moves along the 1-direction, 
the 1-coordinate of the wave front must have the time dependence 


x'(t) =x!) +ct. (4.1.4) 


But then if a second observer who moves in the —1-direction with speed u uses 
the coordinates (4.1.1), she will see the 1-coordinate of the wave front as 


xl =x'O+(c+u)t, (4.1.5) 


so the wave would seem to travel faster or slower than the speed of light ac- 
cording to whether u is positive or negative. Observers can use any coordinate 
systems they like, but Eq. (4.1.5) shows that if Maxwell’s equations in the form 
(3.1.2) are found to hold when an observer uses coordinates x', t then they 
cannot hold in that form when she uses coordinates x”, t’. 

Einstein worried about this as a young man. He was particularly concerned 
with what a light wave would look like to an observer with uw = —c in our 
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example — that is, an observer moving with the light wave. He concluded that 
the electric and magnetic fields would appear frozen in time, though varying 
with position along the ray. Needless to say, this is not a solution of the Maxwell 
equations. 

This problem did not worry Maxwell. In formulating his equations, he 
regarded the electric and magnetic fields as vibrations in an elastic medium, the 
aether. In this case one would not expect the equations to hold for observers 
moving with respect to the aether, any more than the equations for a sound wave 
traveling up and down in an organ pipe would seem the same to an observer 
flying up the pipe as to an observer at rest with respect to the pipe. Maxwell 
thought that his equations would apply only for observers at rest in the aether. 


Michelson—Morley Experiment 


So, if electromagnetic waves are vibrations in the aether, can we measure the 
velocity of the Earth through the aether? The Earth’s orbital motion gives it 
a speed of 30 km/sec relative to the Sun, and the rotation of our galaxy gives 
the solar system a speed of about 200 km/sec relative to the galaxy’s center. 
These speeds are much less than the speed of light, 300000 km/sec, but not 
too small to be measured with a device known as a Michelson interferometer, 
invented by the American physicist Albert Michelson (1852—1931). (Michelson 
interferometers have been used for many purposes since then, most recently in 
the detection of gravitational waves from distant coalescing black holes and/or 
neutron stars.) 

In 1886 Michelson and Edward Morley (1838-1923) set out to measure the 
speed of the Earth through the aether in observations at the US Naval Academy, 
where Michelson had been a midshipman. As a base for their interferometer, 
they used a large stone disk floating on mercury, to allow an easy change in 
its orientation and also to give it some insulation from vibrations in the Earth. 
On this disk they placed a strong source of light, which sent a beam of light 
toward a half-silvered mirror set at 45° to the beam. (See Figure 4.1.) Half 
the beam went straight ahead to an ordinary mirror A at distance L.4 from the 
half-silvered mirror, and half went at a right angle to another ordinary mirror B 
at a distance Lg. From both these two mirrors the beam was reflected back to 
the half-silvered mirror. Some of the two reflected beams went together in the 
direction opposite to the direction to mirror B, to a detector which measured 
the intensity of the recombined beam. If it takes times t4 and tg for the light to 
travel from the half-silvered mirror M along the paths to mirrors A and B and 
back again, then the intensity observed at the detector is proportional to 


Ane rrr + Age 2rivts |? 


= |Agl? + |Apl? + 2|Aal|Ag|cos(27v(t, —tg) +a), (4.1.6) 
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Figure 4.1 The interferometer used in the Michelson—Morley experiment, 
seen from above. 


where A, and Az, are the amplitudes that would be received from mirrors A and 
B if t4 and fg were negligible, a is the relative phase of these amplitudes, and v 
is the light frequency. It is easy to arrange that |.A4| and |A ,| are approximately 
equal, in which case the intensity (4.1.6) is quite sensitive to the argument of 
the cosine. So we need to calculate the times t4 and tg for various orientations 
of the interferometer. 

Adopting the idea of an aether for the sake of argument, let us assume that 
the Earth is traveling through the aether with a speed v, at an angle ¢ to the 
direction of the interferometer’s incident light beam. To calculate t,4 and fg it is 
easiest to work in the frame of reference at rest in the aether, in which the speed 
of light according to Maxwell is c in all directions. If the light takes a time a to 
travel from the half-silvered mirror M to mirror A and a time f, to travel back 
from A to M, then in the time intervals i it travels a distance L4 + trv cos @ 
along its original direction (because during time a the mirror A moves in 
a direction away from M by a distance thu cos @ while in the time r, the 
half-silvered mirror M moves in a direction toward A by a distance t, v cos @). 
In both time intervals the light beam also moves at right angles to its original 
direction by a distance val sing. The total distance traveled in these time 
intervals is then the hypotenuse of a right triangle with sides L 4+ i vcos @ and 
iG usin @, so 
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ee =\/(Ls + trv cos p)2 + (teu sin p)2 =/L3 + Dat, v cos @ + (tv)? . 
Because v is presumably much less than c, it will be enough to keep only terms 
up to second order in v. We can then use the familiar expansion 
2 
VIFx=145-S4., 
so that 


1 
cts Lasley OY a tz vcos d + aa a — cos” op) 
A 


1 
aca Oy Weal trucos d + 5 Lav’ /eVd — cos” p) 


and therefore 


L ur 
+ A 2 
ty ~ —— {14+ —d- : 
A c#vucos¢ ( - 3¢' le 6») 


Adding these results for oi and t,, we see that the terms of first order in v/c 
cancel, leaving us with the second-order correction 


= 2Lac oe 
_ + - eal 
ead sis a pear (14 38 5 (1 — cos 6») 


~ 4 ( Ms 
1+ 2 400s? ~) 
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Since we assumed that the line from the half-silvered mirror M to mirror A is at 
an angle ¢ to the Earth’s velocity through the aether, the line from M to mirror 
B is at an angle 90° — ¢ to the Earth’s velocity. We can therefore find tg by 
simply replacing @ with 90° — ¢ and of course replacing L4 with Lp: 


2LB 
tp ~ — 1+ 20 sie? p) 
The difference, which appears in Eq. (4.1.6), is then 


2(L4 — Lg) (La +Lp) (v? 
ASA= PP (142 >) + or (25) e0820 (4.1.7) 


There is no way that Michelson and Morley could know L4 — Lz and @ accu- 
rately enough to allow them to detect the presence of corrections proportional to 
v/c? by measuring the intensity with a fixed orientation of their interferometer, 
even if they knew the value of @ for that orientation, which of course they did 
not since no one knew the direction of the Earth’s motion through the aether. 
But if they rotated the interferometer through 180°, then cos 2 would vary 


ta —tp> 
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through the whole range from —1 to +1, so t4 — tg in Eq. (4.1.7) would vary by 
an amount (L4+L,)v” / c} and the argument of the cosine in Eq. (4.1.6) would 
change by an amount 27 v(L4 + Lg)v~/c?. This predicts an observable change 
in the intensity (4.1.6) as the interferometer is rotated through 180°, provided 
that 27 v(L4 + Lg)v~/c? is not much less than 27, or in other words provided 
that v/c? is not too small compared with c/v(L4 + Lg) = A/(La + Lap), 
where A = c/v is the light wavelength. In the Michelson—Morley experiment 
(taking account of repeated reflections between the half-silvered mirror and the 
other mirrors) L 4 + Lp was of order 10° cm, while the wavelength A was a few 
times 10->cm, so 4/(L4 + Lg) was of order 10~8, and velocities roughly of 
order 10~4c = 30 km/sec could be easily detected. 

Finding no change in the intensity (4.1.6) as the interferometer was rotated, 
Michelson and Morley concluded in 1887 that the velocity of light as observed 
from the moving Earth is the same in all directions to within 5 km/sec.” That 
is, within the aether theory of that time, the speed v of the Earth relative to the 
aether would have to be less than 5 km/sec, as compared with the undoubted 
orbital velocity of the Earth relative to the Sun of 30 km/sec. By 1964, with 
the use of a laser instead of an incoherent light source, the upper limit on this 
velocity had been reduced to about 1 km/sec.? Even if one imagined that on 
a particular day the Earth happened to be more or less at rest in the aether, 
six months later the Earth would be moving in the opposite direction, with the 
same speed relative to the Sun, and hence with a speed of 60 km/sec relative to 
the aether. 

This surprising result evoked various explanations. H. A. Lorentz* in 1892 
and George Francis Fitzgerald (1851-1901) at about the same time proposed 
that motion through the aether causes a contraction of the dimension of the 
interferometer along the direction of motion, just such as to hide the effect of 
motion on the speed of light. Lorentz, acting on the assumption that all mat- 
ter consists of electrons, tried to explain this “Lorentz—Fitzgerald contraction” 
within a theory of the electron. Similar ideas were elaborated by the polymath 
Henri Poincaré> (1854-1912). But it was Albert Einstein in 1905 who put his 
finger on the solution. 


4.2 Einsteinian Relativity 


Physicists in the first years of the twentieth century were in a strange bind. 
Newton’s equations (4.1.2) of matter and gravitation are invariant under the 


2 A. Michelson and E. W. Morley, Am. J. Sci. 34. 333 (1887). 

3 T. S. Jaseja, A. Javan, J. Murray, and C. H. Townes, Phys. Rev. 133, A1221 (1964). 
4 H. A. Lorentz, Vers]. Kon. Akad. Wetensch. Amsterdam I, 74 (1892). 

5 H. Poincaré, Rendiconti del Circolo Matematico di Palermo 21, 129 (1906). 
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Galilean transformation (4.1.1), while Maxwell’s equations are not. That in 
itself was not so bad — it was possible to believe, as Maxwell did believe, that his 
equations only describe electromagnetism in one frame of reference, supposed 
to be the one at rest in the aether. But as we saw in the previous section, it 
was not possible to detect any effect of motion relative to the aether on the 
speed of light. 


Postulate of Invariance 


Einstein’s solution to this conundrum was presented in 1905 in an article® “On 
the Electrodynamics of Moving Bodies.” As suggested by the title, part of his 
motivation was a peculiar feature of electrodynamics. Consider a magnet mov- 
ing past a conducting wire. To an observer at rest with respect to the wire, 
the changing magnetic field produces an electric field, which, as in an electric 
generator, drives a current in the wire. On the other hand, to an observer at 
rest with respect to the magnet, there is no electric field; instead the motion 
of the wire with velocity v through the magnetic field B produces a force per 
charge v x B/c that drives a current in the wire. Somehow the current is the 
same, although the two observers use different language to describe what is 
happening. So at least some electromagnetic phenomena are unaffected by the 
motion of the observer. 

Einstein also mentioned in passing “unsuccessful attempts to detect a motion 
of the Earth” relative to what he called “the light medium,” but did not give a 
reference to the Michelson—Morley experiment. In his 1905 paper he rejected 
the idea of Lorentz and Fitzgerald that the change in the speed of light due to the 
transformation (4.1.1) is somehow hidden from us by changes in the measuring 
apparatus due to motion. Instead, he insisted that Maxwell’s equations are un- 
affected by uniform motion — only the change in coordinates due to uniform 
motion is not (4.1.1), but something else. 

What was truly new and remarkable in Einstein’s paper was that in working 
out this change of coordinates he supposed that the time coordinate, as well as 
the space coordinates, is affected by the motion of an observer. In writing the 
Galilean transformation in Eq. (4.1.1) I was careful to include the specification 
t’ = t. That was an anachronism — no one before Einstein would have bothered 
to specify that the time coordinate is unaffected by the motion of an observer. It 
was then universally supposed that the flow of time is unaffected by motion or 
anything else. Now Einstein was contemplating the possibility that time as well 
as distance is affected by an observer’s motion. 

Einstein calculated the effect of motion on space and time coordinates by a 
variety of thought experiments, under the assumption that times and distances 
would be measured using light rays. Though he did not put it in this way, he 


6 A. Einstein, Ann. Phys. 17, 891 (1905). 


96 4 Relativity 


was in effect working out what coordinate transformations leave Maxwell’s 
equations, and in particular the speed of light, unchanged. Of course, we can 
redefine spacetime coordinates any way we like. There is no physics content 
to a prescription of how to transform coordinates. As we shall see, what was 
new about the physics introduced by Einstein was not in his change of the 
coordinate transformation, to keep the speed of light constant, but his hypothesis 
that these new transformations leave the equations of mechanics as well as 
electrodynamics invariant. This was not true of Newton’s equations, so Einstein 
had to change these equations, with profound consequences for physics. 

This work of Einstein started what became one of the continuing preoccu- 
pations of modern physics: the study of hypothetical principles of invariance 
and their physical implications. Instead of working through Einstein’s thought 
experiments, the discussion below adopts a more modern spirit. In this section 
we learn what transformations of space and time coordinates, known as Lorentz 
transformations, leave the speed of light invariant; in the following section we 
work out the consequences of the assumption that the laws that govern the 
rigidity of rulers and the ticking of clocks — whatever they are — are invariant 
under Lorentz transformations; in Section 4.4 we calculate the implications of 
the assumption that all the laws of mechanics are invariant under these transfor- 
mations; in Section 4.5 we find the consequences of Lorentz invariance for the 
properties of photons; and in Section 4.6 we check that not only the speed of 
light but Maxwell’s whole theory of electrodynamics is invariant under Lorentz 
transformation. In this work we shall make use of a compact spacetime notation 
introduced in 1907 by Herman Minkowski’ (1864-1909). 


Lorentz Transformations 


Let us first consider what sort of spacetime transformation preserves the speed 
of light. If a light wave front shifts its position by a vector Ax in a time interval 
At, then if light travels at a speed c we have |Ax| = cAt, or in other words 


0 = Ax? — c?(At)’. (4.2.1) 


So, what sort of transformation leaves invariant the quantity Ax” — c?(At)?? 
Before answering this question, it may be mentioned that there is a larger 
group of transformations that leave Ax” — c?(Ar)* invariant only when it van- 
ishes. These are known as conformal transformations. One simple example is a 
rescaling x — Ax, t > At, with A an arbitrary constant. Invariance of the laws 
of nature under conformal transformations would be enough to keep the speed 
of light the same for all observers, but it would apparently make it impossible to 
deal with non-zero masses. Nevertheless, conformal symmetry has been revived 


7 H. Minkowski, lecture delivered to the Math. Ges. Géttingen, November 5, 1907, published in Ann. Phys. 
47, 927 (1915). 
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again and again up to the present as a possible property of physical law at the 
most fundamental level, hidden from us through dynamical effects of one sort 
or another. Here we shall content ourselves with asking about the more limited 
class of transformations that leave Ax? — c?(Ar)? invariant, whether or not it 
vanishes. 

As mentioned above, it will be very convenient to adopt the spacetime nota- 
tion due to Minkowski, with a fourth coordinate x9 = ct. We use letters from 
the middle of the Greek alphabet to label the coordinates of events in spacetime, 


as x", x”, etc. Then the right-hand side of Eq. (4.2.1) may be written 
(Ax)? — c?(At)? = nyyAx# Ax? , 


it being understood that repeated indices are summed over the values 1, 2, 3, 0. 
Here 7,,) is the matrix 
1 gpHSv= 12,3 
Nv = 4 1 w=v=0 (4.2.2) 
O wv. 


In this notation, the condition we impose on coordinate transformations 
x! —> x’ may be written 


Hip hah Aa” Sap hxP Ax” (4.2.3) 


It can be shown® that the most general transformation of the spacetime coordi- 
nates that satisfies this condition is linear: 


Ag = AP Ax, (4.2.4) 


with A”, some set of constants. (We are excluding translations here, under 
which x“ would change by a constant term a“, because Ax” is a difference 
of spacetime coordinates and hence unaffected by translations.) Recall that the 
repetition here of the index p indicates that this index is to be summed over the 
values 1, 2, 3, 0. Condition (4.2.3) now reads 


Hayle Mg DAA” = igh Ae « 


In order for this to be valid for any Ax®, the coefficients of Ax? Ax° on both 
sides must be equal: 


nal 5 = Noo >» (4.2.5) 


for all values of the spacetime coordinate indices p and o. Transformations 
(4.2.4) with A“, satisfying (4.2.5) are known as Lorentz transformations. 


8 Fora proof, see S. Weinberg, Gravitation and Cosmology (Wiley, New York, 1972), Section 2.1. 
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It will be instructive to consider the special class of coordinate transforma- 
tions that act only on the 42 = 3 and yw = 0 components of Ax”, which as we 
shall see is the case for transformations to a frame of reference moving along 
the 3-axis. For any such linear transformation, the only non-zero components of 
A¥,, are 


Ay= i751, MHA, AMp=B, Ap=C, A= D, 


with real constants A, B, C, and D that are constrained by the condition that this 
is a Lorentz transformation. In matrix notation, with A“, given by the element 
of the matrix in row yz and column v: 


or O&O 
bpoo 
wmaACSO 


0 0 


the rows and columns being labeled in the order 1, 2, 3, 0. 

Inserting the formulas for the components of A“, into Eq. (4.2.5) gives 
nothing new if p or o equals | or 2, while foro = o = 3,p =o = 0, 
and p = 3,0 = 0 (or p = 0,0 = 3), we get respectively 


Ma DP 21, C= Pasi, AC= DF =U, 


With three conditions on four parameters, there will be one free parameter left 
when all conditions are satisfied. We will take this parameter as 


B=C/B=D/A. 


From A2 — D2 = 1 we then have 


, D*= p 
1— p2’ 1— p2° 
while from C2 — B? = —1 we have 
go. ,_ 
1— p?’ 1— p2 


To find the signs of A and B we impose an additional limitation on the trans- 
formations we are considering, that they can be obtained by a smooth change of 
parameters such as velocities and angles from a Lorentz transformation that 
does nothing. In our case, of transformations only of x* and x°, a Lorentz 
transformation that does nothing has A = B = | and C = D = 0. Neither 
A nor B can vanish for any #, so if signs do not suddenly change as we change 
the parameters of the Lorentz transformation, we must have A and B positive 
for all Lorentz transformations of this form. Since AC = DB, this tells us also 
that C and D have the same sign, which by definition is the sign of 6. So our 
conclusion is that the non-vanishing components of A“, are 
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Pega Ap = yy A?o = A5 = fy; 


4.2.6 
AN = Ma =1, sins 
where y is the positive quantity 
1 
y=+——.. (4.2.7) 
JV1— p2 
That is, in matrix notation, 
10 0 0 
01 0 0 
a 
APY = OG: a By (4.2.8) 
0 0 By y 


The free parameter 6 can have any sign, but |6| < 1. We will see in Section 4.6 
that not only the speed of light but also the complete set of Maxwell’s equations 
are invariant under these transformations. 

I’ll pause to mention that there are other Lorentz transformations that cannot 
be obtained by a gradual variation of parameters from a Lorentz transforma- 
tion that does nothing. These include the space inversion x? > — x? with x/ 
unchanged for 4 # 3, and the time reversal x9 > — x° with x“ unchanged 
for « #~ 0. (Space inversion is often described as a change of sign of all three 
Cartesian coordinates, but this transformation can be produced by the reversal 
of any one coordinate, followed by an ordinary rotation of 180° around that 
coordinate direction.) As will be discussed in Section 6.5, experiments in the 
1950s showed that invariance under space inversion is only a good approxi- 
mation, being violated by the very weak forces that lead to the decay of some 
radioactive nuclei and elementary particles, and in Section 2.4 we have already 
mentioned that the same is true of invariance under time reversal. We will be 
concerned here only with transformations that can be obtained by a gradual 
variation of parameters from a Lorentz transformation that does nothing. (These 
are known as proper orthochronous Lorentz transformations — proper, meaning 
that the determinant of the matrix A“, is unity, and orthochronous, mean- 
ing that A° > 0. In this book, I will refer to proper orthochronous Lorentz 
transformations simply as “proper.’) 

Now let us consider the physical meaning of 6. Consider a tiny body at rest 
in the frame of reference with coordinates x”. At two different times, separated 
by a time difference Ar, the body is at the same position, so the separation of 
positions is Ax! = 0 with i = 1,2,3. Now suppose we look at the same body in 
the frame of reference with coordinates x’, given by the Lorentz transformation 
(4.2.6). The 1- and 2-coordinates will be unaffected, but the 3-coordinates and 
the times in the new frame of reference will be separated by 


Ax? = A3oAx°? — By Ax® , A= Ax” /c — Ap Ax?/c — yAaAx/c : 
(4.2.9) 
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So in this frame the body has velocity 
v= Ax?/At' = cB. (4.2.10) 


Therefore cf is the velocity in the 3-direction given to a body at rest by the 
Lorentz transformation (4.2.6). 


The Galilean Limit 


For velocities that are much less than the speed of light, |8| = |v|/c is much 
less than one, and Eq. (4.2.7) then gives y very close to one. In this case, setting 
B=v/c, y = 1, and x9 = ct, the transformation (4.2.9) becomes 


Ax? = vAt, At'/=At. 


This is the same as the Galilean transformation (4.1.1) used for instance in 
working out the form of the Navier-Stokes equation in Section 2.5. 


Maximum Speed 


From Eqs. (4.2.7) and (4.2.10), we see that it is not possible for a finite Lorentz 
transformation to take a body from rest to a velocity greater than or even as 
large as c. (In Section 4.7 we will see that causality, the principle that effects 
cannot precede causes, rules out any signal traveling faster than light.) This 
may be surprising, because we can perform a pair of Lorentz transformations, 
each of which gives a body at rest a velocity in the 3-direction greater than c/2, 
which if these were Galilean transformations of the form (4.1.1) when combined 
would give a Galilean transformation from rest to a velocity greater than c. But 
velocities add differently in Einsteinian relativity. 

Suppose we perform a Lorentz transformation x“ > x/4@ = AY yx” that 
gives a particle initially at rest a velocity cf, in the 3-direction and then perform 
a Lorentz transformation x’! — x!“ = A‘ x’ that gives the particle that was 
initially at rest a velocity cB2 in the same direction. The combined effect is a 
linear transformation 


lw Nhe LL p v_ LU v 
XP > XE = Ay pA yx =AnyrxX , 
where 
Hh _ ak yp 
Ajiv = Ag pAjr - 


In matrix notation, this means that 


io @ 0 10 O 0 
Ata 01 O 0 01 =O 0 
sa 00 yw poy 00 wm pm 
00 fhyr 00 Bu uv 
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where, according to the general rules of matrix multiplication, the element in 
row jz and column v of the product is the sum over p of the products of the 
terms in row jz and column p of the first matrix times the terms in row p and 
column v in the second matrix. It is straightforward to calculate that 


1 O 0 0 

ae =f 21 0 0 
- 0 0 ya Baiya1 

0 0 Baya yi 


where 


y21 = viv20 + BiB2), Boiv21 = viv2(Bi + B2) . 
and therefore 
_ Bithe 1 


= > Y= =... 
1+ Bi B2 /1 — B3, 


Thus the relativistic rule for combining velocities is that a Lorentz transforma- 
tion with velocity vj = cf; followed by a Lorentz transformation with velocity 
v2 = c2 in the same direction gives a Lorentz transformation with velocity 


Boi 


vy + v2 
1+ vjv2/c? 
Even if v; and v2 both approach c, the combined velocity v2; approaches c, 
not 2c. 


v21 = cB21 = (4.2.11) 


General Directions 


Of course there is nothing special about the 3-direction. Whatever velocity 
vector Vv is given to a body at rest by a given Lorentz transformation, we can 
always rotate our coordinate axes so that the 3-direction is in the direction of v. 
The Lorentz transformation consequently will have the form (4.2.6), but with 
B = |v\|/c. If we rotate our coordinate axes back to their original direction, 


we find 
Ai, =); + (vy —1)6;8; , 
oe (v a (4.2.12) 
Mo=A yp =puje, LoS ¥s 


where i and j run over the spatial coordinate indices 1, 2, 3; 6;; is the unit 
matrix, 


fi sj 
=| 9 ix], 
and 


y =1/,/1—v2/e2. (4.2.13) 
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Here 0 is the unit vector v/|v|. (To check, note that for v in the 3-direction, 
Eq. (4.2.12) gives A!; = A?) = 1, A933 = 14+ (y — 1) = y, and so on for 
the other components.) Performing a Lorentz transformation with velocity v; 
followed by a Lorentz transformation with velocity v2 in general does not give 
a Lorentz transformation of the form (4.2.12), unless vj and v2 happen to be 
in the same direction. In general we get a rotation, followed by a Lorentz 
transformation of the form (4.2.12). This is not a contradiction, because 
rotations satisfy the condition (4.2.5) and therefore can be considered as 
belonging to a subgroup of the group of Lorentz transformations. Lorentz 
transformations of the special form (4.2.12) are often distinguished from more 
general Lorentz transformations by calling them boosts. 


Special and General Relativity 


A decade after presenting the special theory of relativity, Einstein gave us the 
general theory of relativity.? As its name implies, this theory is based on a 
more general principle of invariance than for special relativity: the laws of 
nature preserve their form under any possible change of spacetime coordinates, 
not just under Lorentz transformations. 

But it should not be thought that special relativity is in any way superseded by 
general relativity. In general relativity it is still true that in certain inertial frames 
of reference, in free fall around any local matter and otherwise more or less at 
rest or in a state of uniform motion with respect to the average matter of the uni- 
verse, the laws of physics are those of special relativity. For example, in inertial 
frames the separation Ax” of spacetime coordinates along a wave front of light 
satisfies n,,, Ax" Ax” = 0; we will see in the next section that the separation 
Ax" of the spacetime coordinates of two ticks of a moving clock whose ticks are 
T seconds apart at rest satisfies yn,» Ax" Ax? = —T*c*; and so on. If we make 
a coordinate transformation other than a Lorentz transformation, for instance to 
a frame of reference that accelerates or spins relative to the inertial frames, 
then the laws take a more general form, in which n,,, is replaced with a field 
8yv(x). This field describes gravitation and satisfies differential equations that 
generalize and correct Newton’s formula for gravitational attraction. In contrast, 
there is no field or any other physical quantity in special relativity that keeps 
track of the velocity of the coordinate system. So invariance plays a different 
role in general and special relativity. General relativity is a theory of the grav- 
itational field, a quantity that keeps track of departures from inertial frames. 
Special relativity is a theory of invariance under Lorentz transformations from 
one inertial frame to another. 


9 A. Einstein, Ann. Phys. 49, 769 (1916). 
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As a first application of Einsteinian relativity we now apply the assumption 
that the laws that govern the rigidity of rulers and the operation of clocks are 
invariant under the Lorentz transformations described in the previous section, 
and we use this assumption to calculate the effects of motion on observed 
distances and times. We shall also anticipate the demonstration in Section 4.6 
that Maxwell’s equations are Lorentz invariant, using this invariance to work 
out the effect of motion on the frequency and wave vectors of electromagnetic 
waves. 


Clocks 


Consider two ticks separated by a time interval T of a small clock at rest in 
the frame of reference with coordinates x“. The spacetime coordinates of these 
ticks are separated by Ax' = 0, Ax® = cT, as usual with i = 1, 2, 3. Now 
perform a Lorentz transformation (4.2.6) that gives the clock a velocity v = Bc 
in the 3-direction. The 1- and 2- coordinates of the clock will be unaffected, 
while in the new reference frame the 3- and 0- coordinates of the clock at these 
two ticks will be separated by 


Ax? = M39 Ax° = yBcT =yoT, (4.3.1) 
Ax” = A% Ax® = ycT , (4.3.2) 


where as before y = (1 — v2fc*) 1, From Eq. (4.3.2) we see that the time 
interval between ticks of the moving clock is lengthened to T’ = Ax/°/c = yT. 
This is what is seen by the observer who sees the clock moving with velocity v; 
an observer who travels with the clock sees its ticks separated by 7’, just as if it 
were at rest. 

There is another way of getting this result without ever looking at a specific 
Lorentz transformation. If the time interval between ticks of a clock at rest is T, 
then the spacetime separation between ticks at rest has components Ax! = 0, 
Ax° = cT, which satisfy Nuv Ax" Ax” = —c’T”, where Nuv is again the diago- 
nal matrix (4.2.2) with elements 1, 1, 1, —1 on the diagonal, and the summation 
convention is again in force. If an observer sees the clock moving with velocity 
v in any direction and measures a time T’ between ticks, then in the coordinates 
x’ used by this observer the spacetime separation between ticks has compo- 
nents Ax’ = v7’, Ax" = cT’, which satisfy nyyAx/*Ax” = vse yr. 
But, as discussed in the previous section, Lorentz transformations are designed 
to keep this quantity invariant, in the sense that n,,,Ax/“ Ax” = nyyAx" Ax”. 
Therefore (v7 — c*)T’? = —c?T”, so as before T’ = T/\/1 — v2/c?. 

This lengthening of course applies to any kind of time interval, not just 
ticks of a clock. It is vividly displayed in the decay of unstable particles 


104 4 Relativity 


in cosmic rays. The collision of atomic nuclei in primary cosmic rays with 
atoms in the upper atmosphere produces particles known as muons, resem- 
bling electrons but about 210 times heavier. At rest, muons are observed to 
decay with a mean lifetime 2.2 microseconds, but although they are typically 
produced at an altitude of about 15 km, a good fraction of these muons reach 
the ground before decaying, so even traveling near the speed of light they 
must have survived for a time (as measured on the Earth’s surface) at least 
15 km/300 000 km/sec = 50 microseconds, and more if they reach the ground 
at a slant. If there were no relativistic time dilation, then the probability 
of a particle with a mean lifetime 2.2 microseconds surviving as long as 
50 microseconds would be exp(—50/2.2) = 1.2 x 10719. Evidently the life of 
these muons is extended by their motion by a factor y at least of order 10, which 
requires their velocity to be within a fraction of a percent of the speed of light. 


Rulers 


Next consider a ruler of length L at rest, lying along the 3-direction in a frame 
of reference with coordinates x”. At any fixed time its ends are separated in this 
frame by Ax? = L, Ax® = 0, and Ax! = Ax? = 0. Now perform a Lorentz 
transformation (4.2.6) that gives the ruler a velocity v (positive or negative) in 
the 3-direction. The spacetime coordinates x” in the new reference frame will 
be separated by 


Ax? =A33L=yL, Ax! =Ax? =0 (4.3.3) 

Ax” = A°3L =yvL/c. (4.3.4) 

But Eq. (4.3.4) shows that in this frame the two ends of the ruler have been 
traveling for times that differ by an amount At! = yvL/c?, so to find the 
difference in the space coordinates at the same time t’, we have to subtract vA?’ 


from Ax’. The spatial separation of the ends of the ruler at the same time 1’ is 
then 


Ax? — vAt! = yL—yLv*/c? =L/y . (4.3.5) 


This contraction of lengths in the direction of motion is similar to what Fitzger- 
ald and Lorentz had proposed as the cause of the failure to measure the velocity 
of the Earth through the aether. 


Light Waves 


We saw in Eq. (3.1.3) that each component of the electromagnetic fields in a 
light wave in empty space can be written as a sum of terms proportional to 
e*'? where ¢ is the phase: 


¢=k-x—ot. (4.3.6) 
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We could always add a spacetime-independent term to ¢ by adjusting the phase 
of the coefficients e and b of e’? in the fields (3.1.3), so it is only the difference 
Ad in @ between spacetime points that has physical significance. We expect 
such phase differences to be Lorentz invariant, because, as we shall see in 
Section 4.6, Lorentz transformations subject electromagnetic fields to real linear 
transformations. So we need to give k and w Lorentz transformation properties 
that ensure the Lorentz invariance of the phase differences: 


Agé =k: Ax—@At. (4.3.7) 


To see how to manage this, we once again introduce a four-dimensional 
notation, taking k° = w/c, so that Eq. (4.3.7) reads 


Ag =k- Ax — K°Ax® = nyykX Ax” , (4.3.8) 


where jy» is again the diagonal matrix (4.2.2) with elements 1, 1, 1, —1 
on the diagonal, and the summation convention is again in force. It is 
obvious then that A@ will be Lorentz invariant if we ascribe to k* the same 
Lorentz transformation as Ax” — that is, if under a Lorentz transformation 
Ax" — A",,Ax” we have 


k# —> APR”, (4.3.9) 


where again A”, satisfies the condition (4.2.5) for a Lorentz transformation, 
NuvA“,pA*s = Noo- Note that the transformation (4.3.9) also preserves the 
condition 


0 = 7 |k? — @ = C2 qyyk*k” , (4.3.10) 


which says that the wave « e!® travels at the speed of light. 

For example, consider a light wave traveling in the +3-direction, which has 
wave vector with k! = k? = 0 and k? = w/c and frequency v = w/27 
in a reference frame with spacetime coordinates x”. Suppose we perform a 
Lorentz transformation x“ — x’“ = A", x” that gives bodies at rest in the 
first reference frame a velocity v (positive or negative) in the 3-direction in the 
new reference frame. With A“, given by Eq. (4.2.6), the frequency in the new 
reference frame is given by 


v! =! /2n = ck /2m = c(A°3k? + A ok®) /2x 
= (A°3+ A%)v =y(l+v/o)v. (4.3.11) 
Using y* = 1/(1 — v*/c?), we can rewrite this in a more revealing form: 
vay l!d—v/c)'v. (4.3.12) 


The factor 1/y is the relativistic time dilation discussed above for moving 
clocks: if time intervals are lengthened by a factor y, then frequencies are 
decreased by a factor 1/y. 
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The factor (1 — v/c)~! is the usual Doppler shift, which applies in both 
non-relativistic and relativistic contexts, and indeed was first observed in sound 
waves. If the source of the light wave is at rest in the reference frame with 
coordinates x, then the Lorentz transformation A“, gives the source a velocity 
v in the 3-direction, which for v positive is along the direction of the light wave 
and hence toward whoever is observing the wave. If the time interval between 
wave crests emitted by the source at rest is 1/v, then, apart from relativistic 
effects, the observer will see these crests arrive at a time interval less by a factor 
1 — v/c, since the distance that each crest has to travel is less than that for the 
previous crest by a factor 1 — v/c, and hence the observed frequency, the rate 
at which wave crests arrive at the observer, is increased (apart from relativistic 
time dilation) by a factor 1/(1 — v/c). For negative v the source is moving away 
from the observer, and the factor (1 — v/c)~! gives a decrease in frequency, as 
seen in the redshift of light from receding galaxies at great distances. 


4.4 Mass, Energy, Momentum, Force 


Einstein published two papers on relativity theory in 1905. Shortly after the first 
paper, which is cited in Section 4.2, he published in the same journal another 
paper!° with the title “Does the inertia of a body depend on its energy content?” 
This is often referred to as “the E = mc? paper,’ but as can be gathered from 
the title, it would be better called “the m = E/c* paper.” In this paper, Einstein 
showed that the mass of a body decreases by an amount E/c” when the body 
emits radiation with energy E. Here “mass” was defined as inertial mass, by the 
prescription that, as in Newtonian mechanics, the kinetic energy of a particle of 
mass m with velocity v < c is mv7/2. 


Einstein’s Thought Experiment 


Here is the proof of Einstein’s result. Consider a particle such as an atomic 
nucleus, at rest in a reference frame with coordinates x“, in an excited state A. 
Suppose that it decays into a state B of lower energy, emitting two “back-to- 
back” photons of equal energy traveling in opposite directions along the 3-axis. 
The symmetry of the problem rules out any recoil of the particle in its final 
state B, so there is no kinetic energy in the initial or final states and hence 
each photon must carry energy (E4 — Epg)/2 and therefore have frequency 
v= (E, = Eg)/2h. 

Now consider the same process as observed in a reference frame with coordi- 
nates x/“ = Ax”, with A“, the Lorentz transformation (4.2.6). In this frame 


10 4. Einstein, Ann. Phys. 18, 639 (1905). 
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the decaying particle is traveling with velocity v in the +3-direction both before 
and after the decay. Suppose that v < c. Before it decays, the total energy of 
the particle is its internal energy EF’, plus its kinetic energy: 


1 
Evefore = Ea + sma : 


According to Eq. (4.3.11), in this reference frame the frequencies of the photons 

that travel in the +3- and —3-directions are respectively 

_ (l+v/c)v _ (1 + v/c) 
Vl-vw/e Jl —v2/c2 


so the total energy of the final state is 


V4 (E4 — Ep)/2h 


1 
Eafter = EB + 5m + hvy + hv_ 


1 2 
= Ext zmBv + (E,4 — Eg)/f1— v?/c? , 
or, since we are assuming that v < c, 


1 
Eafter = Ep + 5mBv + (E4 — Ep)(1 + v?/2c’). 


The conservation of energy requires that 0 = Epefore — Eatter, SO 
1 2 2 2 
O= Ea— Ent sma —ma)v — (E, — Eg)(1+0°/2c*). 


In order for this to be possible with velocity-independent internal energies and 
masses, we must have 


ma —mp = (Eq — Ep)/c? (4.4.1) 


as was to be proved. 

Despite our use of the approximation v < c, Eq. (4.4.1) is not an approxi- 
mate result. No one can stop us from making a Lorentz transformation with an 
arbitrarily small velocity, so we can reduce any error we have made along the 
way in deriving Eq. (4.4.1) to be as small as we like, simply by making v/c 
sufficiently small. 

Equation (4.4.1) is not yet the famous E = mc’. As long as we are dealing 
only with a single body changing its state, as in the above Einstein thought 
experiment, it is only changes in its energy that matter for the conservation of 
energy, not the energy itself, and we might as well define the energy of any one 
state, say the lowest state, as mc”. But E = mc? goes beyond Einstein’s result 
(4.4.1) when we consider a reaction involving a number of bodies, coming into 
and going out of existence, and exchanging energy with each other. 
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General Formulas for Energy and Momentum 


The question of the energy of a massive particle at rest is part of a larger 
question: what are the energy and momentum of a particle moving with arbitrary 
velocity? It is largely up to us what we want to call energy and momentum. 
Historically, as we saw in Section 2.1, physicists gave these names to certain 
quantities that they had found to be conserved. A three-vector at first called 
“quantity of motion” by Newton was found to be conserved as a consequence 
of the equality of action and reaction, and later became known as momentum. 
A rotationally invariant quantity at first called vis viva was found by Huygens 
to be conserved when bodies come into contact, and was later called kinetic 
energy. The concept of energy then had to be broadened to preserve the con- 
servation of energy in more general processes, as for instance by including 
potential energy. It is the conservation of energy and momentum that makes 
these concepts useful, whether we want to calculate how much fuel to use to 
boil a given mass of water or how fast an alpha particle must be traveling to 
give a certain velocity to a gold nucleus that it strikes. 

We are not in a position in this chapter to prove the conservation of whatever 
we call energy and momentum. As we shall see in Section 5.7 of the chapter on 
quantum mechanics, these conservation laws follow from the invariance of the 
laws of nature under translations in time and space. But we can here learn a lot 
from the requirement that the conservation of the total energy and momentum 
of a number of colliding particles must be Lorentz invariant. 

Hence, in order to express the momentum and energy of a body as functions 
of its velocity, we impose two conditions on these functions: 


e The conservation of energy and momentum is Lorentz invariant. That is, if 
one observer sees these quantities conserved, then so must any other observer 
related to the first by a Lorentz transformation. 

e For velocities much less than c, the momentum and (up to a constant term) 
the energy must be given by the same formulas as in Newtonian mechanics. 


To accomplish this, we shall assume that the momentum p and energy E of 
a particle can be assembled into a four-component quantity p” with p° pro- 
portional to the energy FE, which transform just like the components of Ax”. 
That is, in changing our spacetime coordinates from x" to x/“ = Ax’, the 
energy-momentum four-vector p“ of any particle is changed to 


pra AP ip” (4.4.2) 


If the observer who uses the coordinates x” sees that in a collision the momen- 
tum four-vectors p!’ of the various colliding particles satisfy the condition of 
total energy and momentum conservation, 


> Pk - DS pt =0, 


n,before n,after 
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then an observer who uses coordinates x/“ = A“,,x” will see that 


>) ph Do pHa) DD ph—- DO pp | =0 


n,before n,after n,before n,after 


and so will see energy and momentum again conserved. 

The transformation property (4.4.2) allows us to calculate the energy and 
momentum of a particle with an arbitrary velocity if we know its energy 
and momentum when it is at rest. At rest the spatial components p! must 
vanish (which way would this vector point?) and we can take p® to be some 
number that we shall temporarily call N, characterizing the type of particle. It 
follows that the momentum four-vector of a particle with velocity v is given by 


p'(v) = A“ o(v)N 


where A(v) is the Lorentz transformation (the “boost’) that takes the particle 
at rest to velocity v. In particular, for v in the 3-direction, A(v) is the Lorentz 
transformation (4.2.6), so p is in the 3-direction, with value 


pv) = Mo(vyN =y(v/oNn , (4.4.3) 
and 


p°(v) = A°o(v)N = yN (4.4.4) 


where again y = 1/,/1 — v2/c?. 
To implement the second condition above, we next consider the limit v < c. 
Here Eq. (4.4.3) gives 


pe(v) = N[v/e + O(v?/c?)]. 


In order for this to give the Newtonian result p>(v) = mv for v < c, we must 
take N = mc, so that 


p(v) =myv=my{1+ v? /2c* de aaa |, (4.4.5) 
Also, for v « c, Eq. (4.4.4) now gives 
p? = mc[1 + v2/2c? + O(v*/c4)) . 


In order for this to give the Newtonian result mv*/2 for the kinetic energy, we 
must choose the constant of proportionality between p° and E so that E = cp®, 
and hence 


E(v) = mc-y = mc + mv> /2 + mv* /6c? Haw (4.4.6) 


Note that we cannot leave out the term mc? in the energy (4.4.6), or change it 
to any other constant term. If we did, then p” would not satisfy the condition 
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(4.4.2) for a four-vector, and the conservation of energy and momentum would 
not be Lorentz invariant. 

We can eliminate the velocity v from Eqs. (4.4.5) and (4.4.6) to derive a 
relation between energy and momentum. Since y?(1 — v7/c?) = 1, we have 
Es pc? = m4, or in other words, 


E=,/p2c2+m2c'. (4.4.7) 


This can also be derived directly by noting that E? — c?p? = —c Nyy p" p” 
takes the value m?c* in the reference frame in which the body is at rest, and is 
Lorentz invariant, so it takes the same value m2c‘* in all reference frames. 


E = mc? 


Einstein suggested in his 1905 paper that the reduction of mass accompanying 
the emission of energy might be detected by the study of radioactive salts. 
This proved difficult, because it is not easy to measure accurately the atomic 
weights of different states of a radioactive isotope. In the early 1930s it became 
possible to verify Einstein’s relation between energy and mass by studying 
reactions among stable isotopes, such as 'H+7Li > 2 “He. The masses of 
the atoms of !H, ’Li, and “He are respectively 1.007825 m1, 7.016003 m,, and 
4.002603 m1, where m, is the mass of unit atomic weight, defined today as 
1/12 the mass of the carbon isotope !*C. The mass lost in this reaction is thus 
Am = 0.018622 m, = 3.09 x 10-7 g = 17.3 MeV/c”. Thus it is expected 
that the kinetic energies of the two +He nuclei in the final state should 
exceed the kinetic energies of the 'H and ’Li nuclei in the initial state by 
Amc? = 17.3 MeV, and this is observed, verifying E = mc? and not just 
Eq. (4.4.1). 


Force 


Because of the presence of the factor y in Eqs. (4.4.5) and (4.4.6), the quantity 
my is sometimes called the relativistic mass. I will not use this terminology, 
because it suggests that we can calculate the acceleration produced by any force 
just by replacing m in Newton’s F = ma with my, which is not the case. To 
find how bodies respond to forces in special relativity, we need to formulate a 
general Lorentz-invariant version of Newton’s second law. 

Though the time coordinate is Galilean invariant it is not Lorentz invariant, 
so neither is the time derivative d/dt. To replace the time derivative in Newton’s 
second law, we note that dt is Lorentz invariant, where 


dt= V—nuvdxtedx” /0? = jar - dx” /c? = jar — dt*v2/c2 =dt/y . 
(4.4.8) 
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So, in place of the Newtonian formula dp/dt = F, the requirement of Lorentz 
invariance suggests that 


a (4.4.9) 


where F* is a four-vector with the same Lorentz-transformation properties as 
Ax" or k¥ or p*. The space components of Eq. (4.4.9) give 


dp 
ase 
dt 
but p is not just mv, and the factor y in Eq. (4.4.10) is outside the time derivative. 


Incidentally, we do not need a special determination of the time component 
F°. We have already noted that Nyy p' p> = —m?*c?, so 


(4.4.10) 


0= “mop! p’)= ny p” 
Hence 
0 = nyvF" p’ =F -p— F°E/c (4.4.11) 
and therefore 
F°=cF-p/E =F -v/c. (4.4.12) 


We will see in Section 4.6 how to construct the four-vector F“ for the forces 
exerted by electric and magnetic fields on a moving charged particle. 


4.5 Photons as Particles 


AS we saw in Section 3.2, Einstein in 1905 proposed that the energy of radiation 
of a given frequency v is always an integer multiple of hv. This led to the further 
conjecture that the radiation consists of particles, later called photons, each with 
energy hv. A state with energy nhv would then be interpreted as consisting of 
n photons. 


Photon Momentum 


If we suppose that photons are real particles, then we need to work out the 
relation between their energy and the magnitude of their momentum. In order 
for the conservation of energy and momentum to be Lorentz invariant when 
photons interact with other particles, the photon energy E and momentum p 
must form a four-vector p”, with p° = E/c, just as for other particles. That 
is, in changing coordinates from x“ to x/“ = A“,x”, the photon momentum 
four-vector is changed to p’“ = A“,,p’. But we cannot work out formulas for 
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the components of p” in the way we did for other particles, by expressing p“ 
as a Lorentz transformation acting on the four-momentum of a particle at rest, 
because photons never can be at rest. 

Instead, we return to the starting point, that the energy of a quantum of 
radiation is proportional to the frequency. This implies that the time component 
of p” is proportional to the time component of another four-vector, the wave 
vector k“ discussed in Section 4.3. Specifically, using the result k° = w/c given 
there, we have 


p? = E/c =hv/c = ha/c = hk° . 


Then in all Lorentz frames 
0 = p® —nk®. (4.5.1) 


It is a general rule that if the time component a? of a four-vector a“ vanishes in 
all coordinate systems, then the whole four-vector a” vanishes. For if for any 
arbitrary Lorentz transformation A we have a? = a” = 0 where a’! = A"“,a”, 
then 


0=a = A°,a! , 
which implies that a! vanishes. (If a 4 0 we can rotate our coordinate axes so 
that the 3-axis is in the direction of a, and take A“,, to be a Lorentz transforma- 
tion (4.2.6) along this direction, in which case 0 = By |al, soa = 0.) The whole 


four-vector a thus vanishes, as was to be proved. Taking a“ = p" — hk", we 
conclude then from Eq. (4.5.1) that the photon four-momentum is 


pY = hk* (4.5.2) 
and in particular 


Ip| = Alk| =hw/c = E/c. (4.5.3) 


This is just the relation between energy and momentum that we would expect 
from Eq. (4.4.7) if we treat the photon as a particle of zero mass. 


Compton Scattering 


If photons carry momentum, then when a photon is scattered by an electron 
at rest the electron should recoil. Suppose the incoming and outgoing photons 
have wave vectors k and k’, respectively. According to Eq. (4.4.7), the energy 
of an electron of momentum p, is given by 


Ee =] pec? + m2c4 : (4.5.4) 


The conservation of energy in the scattering of a photon by an electron at rest 
requires that 
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ch|k| + mec? = ch|k’| + ,/p2c?2 + m2c4 , 


where p- is the momentum of the recoiling electron. According to Eq. (4.5.2), 
the conservation of momentum gives 


Pe = hk — hk’, 


so the conservation of energy becomes 


ch|k| + mec? = ch|k’| + c2A? (Ik? + |k’|? — 2cos 6|k||k’|) + m2c4 , 


where @ is the angle between the initial and final photon wave vectors. Subtract- 
ing ch|k’| from both sides and squaring, we have 


C72 (k? — 2|k||k’] +k’) + 2c? Am, (|k| — |k’]) + m2c* 
= 7h? (k? +k’ — 2cos 6|k||k’|) + m2c* . 
Cancelling the terms c7/7k?. c?i7k’ >| and m2c* on both sides leaves us with 
[k| — |k’| = |k||k’'|((1 — cos @)A/mec . 


It is conventional to write this in terms of the wavelengths A = 277/|k| and 
M = 27/|k’|, and h = 27h: 


N—2’ = (1 —cosO)h/mec. (4.5.5) 


The quantity h/m,c equals 2.425 x 107! cm, and gives the increase in wave- 
length for a photon scattered at right angles to its original direction. This is 
known as the Compton wavelength of the electron, in honor of Arthur Holly 
Compton (1892-1962). 

Compton at Washington University studied the scattering of monochromatic 
X-ray photons, with energy 17 keV. These photons were created by X-ray flu- 
orescence: atoms of high atomic number, such as platinum, were exposed to a 
beam of high-energy electrons in a tube something like the cathode ray tubes 
used by Thomson (with whom Compton had worked at Cambridge). The beam 
of high-energy electrons knocked electrons out of these atoms, some from inner 
orbits. Then other electrons of nearly zero energy fell into these orbits, emitting 
monochromatic radiation, which, as we saw in our discussion of atomic number 
in Section 3.4, is at X-ray wavelengths for atoms with Z >> 1. In Compton’s 
experiment these photons were directed at a graphite target, where they were 
scattered by an outer electron of the carbon atom. These outer electrons have 
energies of the order of an eV, or at most tens of eV, negligible compared with 
the 17 keV energy of the incoming X-ray photon, so they scattered the X-ray 
photons just as if they were at rest. The wavelength of the scattered photon was 
measured by diffraction scattering, using a single crystal as a diffraction grating. 
Compton’s experiment verified Eq. (4.5.5) in 1923, giving a significant boost to 
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the acceptance of the quantum of light as a particle of zero mass. It was the 
chemist G. N. Lewis (1875-1946) who a few years later gave this particle the 
name “photon.” 

There are other types of particle with zero mass. One is the graviton, 
the quantum of gravitational radiation. This radiation has been observed, but 
there is unfortunately no prospect of observing its quantum nature in the 
foreseeable future. There are also eight types of gluons, massless particles that 
in our present Standard Model are supposed to mediate strong nuclear forces. 
They interact so strongly when pulled away from other strongly interacting 
particles that they cannot even in principle be observed in isolation, but there is 
plenty of indirect evidence of their existence. 


4.6 Electromagnetic Fields and Forces 


Recall that Maxwell’s equations take the form 


Vx B--— =—J, V-E=470, (4.6.1) 
c ot ¢ 
1 0B 

VxE+-—=0O, V-B=0O, (4.6.2) 
c ot 


where E and B are the electric and magnetic fields, while » and J are the 
densities of electric charge and electric current. Are these equations Lorentz 
invariant, as required by Einsteinian relativity? 

That is not quite the right question. We have no a priori knowledge of the 
Lorentz-transformation properties of the electric and magnetic fields. The real 
question that confronts us here is: what Lorentz-transformation properties can 
be supposed for the fields and densities in these equations that will make the 
equations Lorentz invariant? In the course of answering this question, we will 
encounter some algebraic devices that are useful in judging the Lorentz invari- 
ance of all sorts of field theories. 


Let’s start by considering the charge density p(x, f) and current density J(x, f) 
appearing on the right-hand sides of the Maxwell equations (4.6.1). Following 
the same arguments as in Section 2.5, because electric charge is conserved these 
satisfy a continuity equation like Eq. (2.5.2): 


~ ps.) +V-J(x,t)=0. (4.6.3) 


This can be derived directly from the inhomogeneous Maxwell equations 
(4.6.1); just add c times the divergence of the first equation to the time derivative 
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of the second equation. So how should p(x, t) and J(x, t) behave under Lorentz 
transformations in order for Eq. (4.6.3) to be Lorentz invariant? 

It helps to put the continuity equation in a revealing four-dimensional form. 
Define a four-component quantity J“ (x) with J %(%) = cp(x). Then, recalling 
that x° = ct, Eq. (4.6.3) reads 


0 
sa) =0, (4.6.4) 
with repeated indices summed as usual over the values 1, 2, 3, 0. Now, how does 
the partial derivative 0/dx” transform if we perform a Lorentz transformation 
x! —> x/# = AM, x”? The chain rule of partial differentiation tells us that 
a ax” 9 
ax” Ax” Ax'l 


so in our case 


a) a) 
u = 
A eo ae (4.6.5) 
Therefore, if we suppose that J“(x) transforms as a four-vector under the 
Lorentz transformation x“ — x’! = A#,.x’, in the sense that the current 


J’ (x’) measured by an observer who uses spacetime coordinates x’ is 


IP) ak I Os (4.6.6) 


then 
2 FG) = AP yd (a) = 
ax! axle” Ox 
This is the Lorentz transformation of what is called a scalar. The quantity 
dJ"/dx" is seen by different observers to have the same value at the same 
point in spacetime, although these observers use different spacetime coordinate 
systems to label that point. 

So, if an observer who uses spacetime coordinates x“ sees dJ"/dx" to 
vanish at some particular value a of these coordinates, then an observer who 
uses spacetime coordinates x’ = A“,x” will see dJ’“/dx’ vanish at the 
corresponding coordinates < = A",x7. In particular, if the first observer sees 
oJ" /dx" vanish everywhere, then so will any other observer whose coordi- 
nates are related to those of the first observer by a Lorentz transformation. 
So the Lorentz transformation (4.6.6) does make the conservation condition 
(4.6.4) Lorentz invariant. 


Ie) 


The Inhomogeneous Maxwell Equations 


We next consider how to rewrite the inhomogeneous Maxwell equations (4.6.1). 
The Lorentz invariance of these equations requires that we give E and B 
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Lorentz-transformation properties such that their first derivatives with respect 
to space and time coordinates can be assembled into a four-component field 
that transforms as a four-vector, in the same sense (4.6.6) as J”. We cannot 
assemble a four-vector from the six components of E and B themselves, but 
we can assemble them into a different sort of quantity, an antisymmetric array 
F*(x) = —F (x) with two vector indices. We take 


EF, =F =-F™, £y= FP =-F*, F3= FP =-F*, (4.6.7) 
B,}=FR=-F”, B= FPF} =-F?, B= FV =-F?!, (4.68) 


and F*® = 0 if w = v. In this notation the 3-component of the first of the 
inhomogeneous equations (4.6.1) reads 


4c, 0B. 0B, dE; OF?" 
ce Ax! x2 0x9 ax 


with the understanding that in accordance with the summation convention the 
repeated index v is summed over the values 1, 2, 3, 0, with the v = 3 term 
here vanishing because F>? = 0. The same applies to the 1-component and 2- 
component of the first of equations (4.6.1). Further, in this notation the second 
of equations (4.6.1) reads 


4x a a a aro 

— j=) - Ea FP" + Fr Pe 

c ox! ax? ax3 ax” 

So, in this notation all of the inhomogeneous Maxwell equations (4.6.1) can be 
summarized in the single four-component equation 


2 pH) a “8 Fue : (4.6.9) 
ox” Cc 

It is now almost obvious how to make the inhomogeneous Maxwell equations 
Lorentz invariant. We suppose that under a Lorentz transformation x“ — x/4 = 
AM ,x” the field F“”(x) transforms like J“(x), but with a pair of four-valued 
indices. That is, the observer who uses coordinates x’“ measures electric and 
magnetic fields with 


PO NO A EO Ge) (4.6.10) 


Fields with this sort of transformation property are known as tensors. 

To see that this makes Eq. (4.6.9) Lorentz invariant, consider a general 
Lorentz transformation x4 — x’/ = A¥,x”. Multiplying Eq. (4.6.9) with A?,, 
and using Eq. (4.6.5) again to set 0/dx” = A%,0/dx’" gives 


An 
rare ae ae 


AP Ae — 
perv é 


0 
ayia Bh) = AP PE) = 
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Using the transformation properties (4.6.6) and (4.6.10), this becomes 


a IPO (y.!\ _ 4a 1p! 

ae (x) = 7 m Alas oa er 
Thus Eq. (4.6.9) holds in the frame of reference with coordinates x’ if it holds 
in the frame of reference with coordinates x“, which is what we mean when 
we Say it is Lorentz invariant. This then is a partial answer to our question: the 
inhomogeneous Maxwell equations (4.6.1) are Lorentz invariant if the electric 
and magnetic fields transform as components (4.6.7) and (4.6.8) of an antisym- 
metric tensor field. 

This represents a unification of electricity and magnetism beyond anything 
of which Oersted, Ampére, Faraday, or even Maxwell could have dreamed. Not 
only are electric and magnetic fields coupled in the field equations — putting an 
observer into motion can change electric or magnetic fields into combinations of 
both electric and magnetic fields. For example, suppose an observer using coor- 
dinates x“ finds a uniform electric field E in the 1-direction, and no magnetic 
field, so that the only non-vanishing component of F“” is F°! = —F!° = Ey. 
Suppose a second observer uses coordinates x/“ = A“,,(v)x”, where A“,,(v) 
is the Lorentz transformation (4.2.6) that gives a body at rest a velocity v in the 
3-direction, whose non-vanishing components are: 


A33=A% =y, Ao = A% = By, 
Al, =A%,=1, 


where 8 = v/c, and y is again the positive quantity y = +1/./1— f?. The 
second observer sees an electromagnetic field 


F'#Y = AM y(v) AYg(v) FP? = (A*o(v) AY (v) — A*1(v) A’o(v))F1 . 
Its only non-vanishing components in this case are 
FE, = FU =-—F = AE = yk), 
BaF =aP =A k= pyre. 


Not only is the electric field increased; a magnetic field appears where before 
there was none. This is the sort of thing that had led Einstein to his 1905 


paper. 


Upstairs, Downstairs 


We still have to verify that, with electromagnetic fields obeying the trans- 
formation rule (4.6.10) that makes the inhomogeneous Maxwell equations 
(4.6.1) Lorentz invariant, the homogeneous Maxwell equations (4.6.2) are also 
Lorentz invariant. To check this, we need to widen our ideas about vectors and 
tensors. 
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In general, we define a four-vector field V“ (x) as a quantity that has the same 
Lorentz-transformation property as Ax” or p” or J": 
Vea) Vo’) = AY V(x). 


There is a different kind of four-vector field, conventionally written with a lower 
index, that transforms according to 


Uy (x) > UL’) = Ap” Uy(x), (4.6.11) 
where A,,” is the transposed inverse of the matrix A“, in the sense that 
1 p=o 
Ay? Al = Muto! = 05 = { phe (4.6.12) 


The classic example of a vector that is naturally defined with a lower index is 
the partial derivative. If we multiply Eq. (4.6.5) with A”, sum over the repeated 
index v, and use Eq. (4.6.12), we find 

0 


0 v 
eT Ap a (4.6.13) 
It is trivial to calculate the transposed inverse A,,” of any given Lorentz 
transformation A“,,. To see this, recall the defining characteristic (4.2.5) of 
Lorentz transformations: 


Wiel” Dy: = Tee 
Multiplying with A,.”, summing over p, and using Eq. (4.6.12) gives 
NevAvo =NpoAk? . (4.6.14) 
That is, for i and j each running over 1, 2, 3: 
Aji = A‘;, Aol = —A°, , Ai? =—Alo, Ao? = A%. 


In general, a tensor can have both upper and lower indices, and transforms 
with a A or its transposed inverse for each. For instance, a tensor t”, has the 
transformation property 


te ee aa 


If we set an upper index equal to a lower index and (following the summation 
convention) sum over this index, we get another tensor with one less upper index 
and one less lower index. For instance, in the above example, if we set v = ¢ 
and sum, we obtain a quantity v“ = t“”,,, with the transformation property of a 
tensor with one index — that is, a vector: 


a ah NP RD By OE ge ROS! me AP A a 
as required for a vector. One case has been already encountered: if we define a 


tensor ¢/' = 0J“/dx” and set the upper and lower indices equal and sum, we 
obtain a quantity that we already know is a scalar: iy = dJ"/dx". 
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Although it is important not to confuse upper and lower indices, the differ- 
ence between them is just a matter of the sign of the time components. We 
can use the matrix 7,,, to lower an index on any tensor, giving a new tensor. For 
instance, returning to our earlier example, if t“”, is a tensor with transformation 


property 
tHY > AMA AS ts , 

we can lower the index v, defining a new tensor: 

U co = Not” y . 
Using Eq. (4.6.14), we see that this has the transformation property 

UY oo > Nye AM, A’ eA pets 
SAM Ae Help pS AAs Ae we 

as is appropriate for a tensor with one upper index and two lower indices. (It is 


also possible to raise any lower indices on tensors, but we won’t need to do this 
here.) 


The Homogeneous Maxwell Equations 

With our new-found power to lower indices, let introduce a new tensor: 
OFuv OF | OFiy 
ax* Ox ax” - 
It is easy to see that H,,,, is totally antisymmetric. For instance, interchanging 
ju and A gives 

n= OFiy  OFup | OFun OF OO Fuy Fay 

Vax "Ax" ax¥ ax Axe age UA 

Therefore Hj, vanishes unless all three indices are unequal. In four space- 
time dimensions, this means that H,,,, has only four independent components. 
Lowering the indices in Eqs. (4.6.7) and (4.6.8), we have 


EF, =—Fo, Eo=—Fo, E3=—Fo3, 
By = Fo3, Bo = F31, By = Fi2, 


Hua = (4.6.15) 


(4.6.16) 


SO 


OFi2 OFo3 OF 0B3 OB, 0B2 
M23 = = =V-B, 
MS 9x3 7 ax! om ax? ax3 - ax! = ax? 
OFi2 OFro OFoy 10B3 JE2 JF, 


H = — =. 
ree “950 ax! ax2 c Of ax! = ax2 


1 0B 
[eat xe : 
3 


c Ot 
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Likewise 
1 0B 1 0B 
fig39 =|-=-+VXE!| , Hi0=|-—+VYXxE| . 
c Ot 1 c Ot 2 


Hence the homogeneous Maxwell equations are the same as the requirement 
that, for all w, v, and A, 


yy, =0. (4.6.17) 


This is a manifestly Lorentz-invariant condition; if Hj,» (x) vanishes, then so 
does Hig (x') = Ap Ag Ac Aya (2). 

(We will not here use the formalism of differential forms, but for anyone 
interested in this subject, I mention in passing that a completely antisymmetric 
tensor with p lower indices is known as a p-form. Thus F,,, is a 2-form, and 
Ay,y, is a 3-form. Given a p-form, we can form a p + 1-form, known as the 
exterior derivative, by taking the spacetime derivative and antisymmetrizing. 
Thus, H,,», is the exterior derivative of F,,,. A p-form whose exterior deriva- 
tive vanishes is said to be closed; a p-form that can be written as the exterior 
derivative of a p — 1-form is said to be exact. Thus H,,y, is exact, and the 
homogeneous Maxwell equations (4.6.2) tell us that F/,, is closed. It is easy to 
see that, because partial derivatives commute, any exact p-form is closed, and 
a profound theorem due to Poincaré tells us that in simply connected spaces 
any closed p-form is exact but that this is not necessarily true in spaces with 
more complicated topology.!! In electrodynamics, since F, wv iS Closed, we can 
conclude that in ordinary spacetime it is exact, so it can be written as the 
exterior derivative of a 1-form A, known as the four-vector potential; that 
is, Fy» = 0A,/dx — 0A,/dx". Maxwell originally wrote his equations as 
differential equations for A and A®, not E and B.) 


Electric and Magnetic Forces 


We saw in Section 4.4 that in special relativity Newton’s F = ma is replaced 
with the Lorentz-invariant formula (4.4.9) 
dp 
A (4.6.18) 
dt 


where p” is the four-vector of energy and momentum, cdt = [—nyvdx"dx 
and F* is a four-vector subject to the constraint 


Guap Ff =O. (4.6.19) 


a be 


1 For a more thorough treatment, see e.g. H. Flanders, Differential Forms (Academic Press, New York, 
1963). 
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So, what should we take for F” in the case of a particle of charge g in a space 
that is empty except for being pervaded by electric and magnetic fields? Just 
as for the momentum four-vector of massive particles, the force four-vector is 
uniquely determined by the condition that it takes the known form for a particle 
at rest, and is a four-vector, so that it is given by a Lorentz transformation for a 
particle of any velocity. 

There is an obvious four-vector that is linear in the electric and magnetic 
fields and (because F“? = — FP) satisfies Eq. (4.6.19): 


fers fgg lh" p” 


so we can guess that F” « f. To check that this gives the right answer for a 
particle at rest and to find the coefficient of proportionality, let us evaluate f“ 
for a particle at rest, for which p = 0 and p® = mc. In this limit 


fi > —mcF’® = mcE; , i — —mcF” =0 


with i = 1, 2,3. Therefore to have agreement with the familiar formula dp/dt = 
gE for the acceleration of a particle of charge g and zero velocity by an electric 
field, we take 


q 


Fe = — fla — Noo FM p® : (4.6.20) 
mc mc 
That is, for a general velocity, 
Fi = 4 [Fi pi — F®p%, (4.6.21) 
mc 


and in three-vector notation, recalling that p? = mcy, p = myv, 
F= *(p°E + p x B] = gy[E +v x B/c]. (4.6.22) 
Since dt = dt/y, this gives 
mtv = qg|E+v x B/c]. (4.6.23) 


Given the existence of the force exerted by electric fields, the force exerted 
by magnetic fields is an inevitable consequence of Lorentz invariance. It is a 
special feature of electromagnetic forces that the only change in the equation 
of motion introduced by special relativity is the replacement of the mass m in 
the momentum with my, which in this one case allows us to treat my as a 
relativistic mass. 


4.7 Causality 


We saw in Section 4.2 that no Lorentz transformation acting on a body at rest 
could give it a speed greater than c, the speed of light. We can derive a stronger 
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result, that no influence whatever can travel faster than light. This is not just a 
confession of technological inadequacy, but a consequence of an assumption of 
causality, that effects always come after causes. 


Invariance of Temporal Order 


Suppose that in some coordinate frame the difference between the spacetime 
coordinates of an event and the event that cause it is Ax": 


bo _ yk = Ayh 
X effect — *cause = Ax. 


According to the principle of causality, we must have At = Ax°/c > 0. Now 
suppose we perform a Lorentz transformation A“, that would give a body at 
rest a velocity v in the direction opposite to the spatial separation Ax. Without 
loss of generality, we can rotate our coordinate system so that Ax and thus 
—v are in the 3-direction. Then A”, takes the form (4.2.6) with 6 = —|v|/c, 
and in the new coordinate frame the difference between the times of effect and 
cause is 


At’ = A°, Ax“ /c = y[At — v|Ax|/c?] (4.7.1) 


where v = |v| and y = 1/./1 — v2/c?. Now, v can be anything, except that 
it must be less than c, so if |Ax|/c is greater than At we could make Ar?’ 
negative by taking v in the range 1 > v/c > cAt/|Ax|. So the observer using 
coordinates x’ would see the effect precede the cause. 

To rule this out, we must assume that the difference Ax” between the space- 
time coordinates of an event and the event that causes it satisfies the inequality 


|Ax|/c < Ar. (4.7.2) 


Whatever physical influence is exerted by the cause to produce the effect travels 
at a speed | Ax|/Ar; the inequality (4.7.2) says that the speed of this influence 
must be no greater than c. 

Fortunately, if the bound (4.7.2) is seen to be satisfied by one observer then it 
is satisfied for any other observer related to the first by the sort of proper Lorentz 
transformation discussed in this chapter. The inequality (4.7.2) is equivalent to 
the inequality 


—nyyAx” Ax” = c?(At)? —|Ax|? >0. (4.7.3) 


This quantity is Lorentz invariant, so if Eq. (4.7.3) is satisfied for one observer 
using coordinates x“, the corresponding inequality must be satisfied for coordi- 
nates x‘ = A“,,x”, so we must also have 


etary = Ax. (4.7.4) 
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Since this gives a non-zero lower bound on |At’|, in order for At’ to have an 
opposite sign from At the Lorentz transformation would have to produce a dis- 
continuous jump in the coordinates. This is not possible for the sort of “proper” 
Lorentz transformation that concerns us in this chapter, which as discussed in 
Section 4.2 can be produced from the identity transformation x — x by a 
smooth change of parameters. So if one observer sees At > O and |Ax|/c < At, 
then any observer related to the first by a proper Lorentz transformation will see 
At’ > Oand |Ax’|/c < At’. 


Light Cone 


These conclusions are well illustrated by introducing the light cone, the space- 
time surface with 7,,, Ax Ax” = 0. Points outside the light cone fall on hyper- 
boloids with ,,, Ax" Ax” = a > 0. Any point on one of these hyperboloids 
can be taken to any other point on the same hyperboloid (that is, with the same 
value of a) by a proper Lorentz transformation, even if this entails a change 
of sign of Ax°. On these hyperboloids |Ax| > c|At|, so it is not possible for 
any influence traveling at less than the speed of light to traverse a spacetime 
interval Ax” outside the light cone. Thus as long as we assume that physical 
influences never travel faster than light, the circumstance that proper Lorentz 
transformations can change the sign of Ar outside the light cone presents no 
challenge to causality. 

Points inside the light cone fall on hyperboloids with n,,, Ax" Ax” = b < 0. 
For each value of b there are two disconnected hyperboloids, one inside the 
future light cone, with Ax® > 0, and one inside the past light cone, with 
Ax° < 0. Any point on one of these connected hyperboloids can be taken to 
any other point on the same hyperboloid by a proper Lorentz transformation, but 
proper Lorentz transformations cannot take us from inside the future light cone 
to inside the past light cone. Causality requires that the difference Ax“ in the 
coordinates of an effect and its cause be on or within the future light cone, and 
if one observer sees this to be the case then so will all other observers related to 
the first by a proper Lorentz transformation. 


Ss) 


Quantum Mechanics 


Our modern understanding of atoms, molecules, solids, atomic nuclei, and ele- 
mentary particles is largely based on quantum mechanics. Quantum mechanics 
grew in the mid-1920s out of two independent developments: the 1925 matrix 
mechanics of Werner Heisenberg! (1901-1976), and the 1926 wave mechanics 
of Erwin Schrédinger* (1887-1961). For the most part in this chapter we will 
follow the path of wave mechanics, which is far more convenient for all but the 
simplest calculations. After a look at the historical inspiration for wave mechan- 
ics in Section 5.1 the Schrédinger equation will be introduced in Section 5.2 and 
used to derive not only the hydrogen energy levels found by Bohr but also their 
degeneracy. The general principles of the wave mechanical formulation of quan- 
tum mechanics are laid out in Section 5.3 and provide a basis for the discussion 
of spin in Section 5.4, identical particles in Section 5.5, and scattering processes 
in Section 5.6. In Section 5.7 the general principles are supplemented with the 
canonical formalism, which is used in Section 5.8 to work out the Schrédinger 
equation for charged particles in a general electromagnetic field. This will pro- 
vide us with examples of the application of a widely useful approximation 
scheme, perturbation theory, which is outlined in general terms in Section 5.9. 

The two approaches of wave and matrix mechanics were unified by Paul 
Dirac (1902-1984) in a more abstract formalism, which he called transforma- 
tion theory.> This has evolved into a modern approach in which physical states 
are represented by vectors in an abstract space known as Hilbert space, with 
wave functions arising as components of these vectors in a suitable basis. The 
Hilbert space approach is briefly described in Section 5.10. 


lw. Heisenberg, Zeit. Phys. 33, 879 (1925). This article is reprinted in English in Van der Waerden, Sources 
of Quantum Mechanics, listed in the bibliography. 

2 &§. Schrédinger, Ann. Physik 79, 361, 409 (1926). These articles are reprinted in English in Shearer, 
Collected Papers on Wave Mechanics, listed in the bibliography. 

3 This approach is described in Dirac, The Principles of Quantum Mechanics, listed in the bibliography. 
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5.1 De Broglie Waves 


Free-Particle Wave Functions 


Wave mechanics can be traced to the 1923 Paris Ph.D. thesis of Louis de 
Broglie* (1892-1987). De Broglie was inspired by the quantum interpretation 
of electromagnetic radiation. If an electromagnetic wave can somehow be 
interpreted as a stream of particles, photons, then might not electrons, which 
are undoubtedly particles, be described somehow as waves? As we saw in 
Section 4.5, the momentum p and energy E of the photons making up an 
electromagnetic wave that is proportional to exp(ik - x — iwft) are given by 
p = fk and E = ha, so the wave has the spacetime dependence 


E and B x exp[ip-x/h —iEt/h| , (5.1.1) 


plus the complex conjugates. De Broglie in his thesis suggested that an electron 
of momentum p is associated with a complex wave function of similar form 


Wp(x, t) « exp[ip-x/h —iE(p)t/h] , (5.1.2) 


where now the energy is not c|p|, as for a photon, but rather is given by the 


formula (4.5.4): 
E(p) = ,/m2c* + pc? , 


with m, the electron mass. 


Group Velocity 


The association of the wave (5.1.2) with a moving electron gained plausibility 
from the remark that a localized packet of these waves travels with the velocity 
of the electron. Consider a packet of these waves: 


wox.t) =f d°p ep) exp[ip-x/h—iE@)r/h] 5.1.3) 


where g(p) is a smooth function of momentum that is peaked at some value P. 
Suppose also that g(p) is chosen so that at ¢ = 0 the integral is peaked at x = 0. 
(This will be the case if g(p) varies little over some range around P that is large 
enough that if x is not near zero then the factor exp [ip - x//fi)] in Eq. (5.1.3) 
at ¢ = 0 will undergo many oscillations over the range of the integral, which 
makes the integral exponentially small except near x = 0.) Then, by expanding 
the argument of the exponential around P, we have 


E(p) = E®)+V-(p—P)+::: 


4 L. de Broglie, Comptes Rendus Acad. Sci. 177, 507, 548, 630 (1923). 
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where 
OE 
Vj = oe] . (5.1.4) 
ODi p=P 
This gives the wave function for t 4 0: 
w(x,t) ~ exp[iP-x/h —i[E(P) — V- P]t/h] 
x fe g(p) exp(ip - [x — Vr]) . (5.1.5) 


Because of the way we have constructed the packet function g(p), the mag- 
nitude of (5.1.5) is peaked at x = Vr, which shows that the packet moves at 
velocity V, known as its group velocity. But Vi = dE(p)/dp; = c?p;/E(p), 
which as shown in Eqs. (4.4.5) and (4.4.6) is indeed the velocity of a particle of 
momentum p. 


Application to Hydrogen 


De Broglie’s hypothesis met with just one initial success. The electron in a hy- 
drogen atom is not free, but moves under the influence of the proton’s attraction. 
Nevertheless, de Broglie supposed that the electron is described by the free- 
particle wave function (5.1.2), but with the waves traveling in a circle around 
the proton like sound waves in a toroidal organ pipe. To avoid a discontinuity in 
w, it is necessary that a whole number n of wavelengths 4 should fit around the 
circle, so the radius of the circle is constrained by the condition that 27r = nd, 
with n = 1,2,... According to Eq. (5.1.2), A = 27h/p, where p = |p|, so de 
Broglie’s condition was 


pr =nh (5.1.6) 


which for non-relativistic electrons with p = mev is the same as Bohr’s 
condition (3.4.2), but now with no need of the correspondence principle to 
infer that # = h/2z. De Broglie could then repeat Bohr’s calculation, using 
the non-relativistic formula E = m,v~/2 — e?/r for energy and the formula 
mev-/r = e7/r for centripetal acceleration, and thereby obtain Bohr’s formula 
E = —e*m,/2h*n* for the hydrogen energy levels. Nothing new had been 
learned about hydrogen, but de Broglie’s derivation at least gave a hint at an 
explanation of Bohr’s quantization condition. 


Davisson—Germer Experiment 


There is a story that in his oral Ph.D. examination, de Broglie was asked if there 
was some direct way of observing the wave nature of electrons, and he answered 
that it might be possible to observe the diffraction of electron waves by a crystal 
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lattice, like the well-known diffraction of X-rays used for instance in measuring 
the increase of wavelength in Compton scattering. Whether or not this story is 
true, the idea was a good one. According to Eq. (5.1.2), the wavelength of a 
non-relativistic electron with kinetic energy Ee < mec? is given by 


d= 2nh/pe = 20h//2m-Ee = 12.26 x 10-8 cm[E-(eV)] 1/7. (5.1.7) 


Hence we only need electrons with energy a bit larger than 10 eV to get wave- 
lengths nearly as small as a typical lattice spacing, about 10~® cm. This is no 
coincidence. In de Broglie’s interpretation of the Bohr quantization assumption, 
the wavelength of an electron with an energy of a few eV, which is typical 
of atomic binding energies, must fit a few times around an atomic orbit, and 
therefore must be similar to the size of the atom, which is similar to the spacing 
of atoms in crystals. 

Several physicists tried and failed to observe the diffraction of electron 
waves, until it was finally measured in 1927 by Clinton Davisson (1881-1958) 
and Lester Germer (1896-1971) at the old Bell Telephone Laboratories building 
on West Street in Manhattan.° (It was also measured at about the same time 
at the University of Aberdeen by George Paget Thomson (1892-1975), a 
son of J. J. Thomson.) They used a beam of electrons with kinetic energy 
54 eV, incident on a single crystal of nickel with a spacing of lattice planes 
d =0.91 x 107 cm (already known from measurements using X-ray diffrac- 
tion). Electrons are reflected not only from the surface of the crystal, but from 
numerous planes within the nickel. At certain angles 6 between the incident 
and reflected waves all these reflected waves go off with the same phase and 
therefore add constructively, leading to enhanced reflection at these angles. 
According to a 1913 formula (derived in the appendix to this section) of 
William Henry Bragg (1862-1942) and his son Lawrence Bragg (1890-1971), 
for any sort of wave the angles 6, between incident and reflected waves at 
which reflection is enhanced in this way satisfy the Bragg formula: 


nd = 2d cos(O,/2) , (5.1.8) 


where n = 1,2,3,... Davisson and Germer found an enhanced n = | reflection 
at 0; = 50°, giving a wavelength 


4 =2x 0.91 x 1078 cm x cos(25°) = 1.68 x 1078 cm , 


in satisfactory agreement with the wavelength 1.67 x 10~8 cm expected from 
Eq. (5.1.7) for a kinetic energy of 54 eV. 

The wave nature of the electron allowed the development of a new instru- 
ment, the electron microscope. Recall that a photon of energy E has wavelength 


5 C. Davisson and L. Germer, Phys. Rev. 30, 707 (1927). 
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Figure 5.1 Derivation of the Bragg formula. The bold lines represent the 
planes of the crystal lattice, seen edge on. Arrows indicate the direction of the 
light rays. 


Ay = 2mhc/E, so the ratio of the wavelength (5.1.7) of an electron of energy 
E to the wavelength of a photon of the same energy is 


re E 
hy 7 2mec2 © 


For energies in the range of 10 eV to 10 keV this is very much less than one, 
giving electron microscopes much better resolution than microscopes using 
photons of the same energy. 


Appendix: Derivation of the Bragg Formula 


Suppose that a wave of some sort is incident on a crystal lattice, with a ray 
striking one plane of the lattice at point A, where it makes an angle @ between 
the ray and the plane. (See Figure 5.1.) Part of the wave is reflected, with the 
reflected ray making the same angle @ with the plane. Another part of the wave 
continues in its original direction to the next plane, with the ray striking this 
plane at point B, again at angle @. Part of this ray is reflected at B, again at 
angle ¢, while another part continues to deeper planes. Draw a line from B in 
a direction normal to the first reflected ray, intersecting this ray at a point C. 
The purpose of this construction is that the two parallel reflected rays travel the 
same distance from B and C to any distant detector, so the difference in the total 
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distance that each ray travels to the detector is AB — AC. The two rays interfere 
constructively if this difference is a whole number n of wavelengths 1: 


AB—AC=n), n= 1,2.3,.-0 


If this is satisfied, then the difference in the distance traveled by rays reflected 
from the second and third planes will also be nA, and so on down into deeper 
and deeper planes, so all these reflected waves will interfere constructively. The 
same is true of any rays that strike the crystal along parallel directions, whether 
they are reflected from the first, second, or any other crystal plane. We then have 
a very strong enhancement of the reflection. So we have to ask, how do AB and 
AC depend on the lattice spacing and on the angle ¢@ between the rays and the 
crystal planes? 

Draw a line from A to the second plane, which intersects it at a right angle at 
a point D. The length of the line AD is the spacing d of lattice planes. Looking 
at the right triangle ADB (whose hypotenuse is AB) we see that 


AB=d/sing. 


To calculate AC, note that the angle at B between BA and BC is 180° — 2¢ — 
90° = 90° — 2¢, so looking at the right triangle BAC (whose hypotenuse is 
AB), we see that 


AC = ABsin(90° — 2¢) = ABcos(2¢) = AB[1 — 2 sin? ¢] 
Ne) 
AB — AC = 2ABsin’ ¢ = 2d sing 
and the condition for constructive interference is therefore 
nk =2dsingd. 


It is common to describe the reflection in terms of the angle 6 between the 
incident and reflected rays, 9 = 180° —2¢, so @ = 90° — 6/2, and the condition 
for constructive interference is then 


nx = 2d cos(6/2) , 


as was to be shown. 


5.2 The Schrodinger Equation 


Wave Equation in a Potential 


De Broglie in 1923 had described the wave function associated with a free 
electron and had scored some success in applying this to the electron in a 
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hydrogen atom, imagining a free electron wave running around the electron 
orbit. But of course electrons in atoms are not free. Starting in 1925, Erwin 
Schrédinger struggled to extend the idea of the wave function to an electron 
moving in a potential.® 

Schrédinger’s starting point was de Broglie’s wave theory. Equation (5.1.2) 
gives the wave function of a free electron of momentum p as 


Wp(x,t) x exp[ip-x/h —iE(p)t/h] . 
The content of this explicit formula can be expressed as a pair of differential 
equations 


—iAV p(t) = pUp%t), (5.2.1) 
a) 
in Wp 1) = E(p) p(x, t) , (5.2.2) 
where, now in a non-relativistic approximation, 
2 1 2 
E(p) ~ mec’ + Pp. (5.2.3) 
2Me 


An electron that is bound in an atom cannot have a definite momentum — 
classically, it goes round and round its orbit — so we would not expect the bound 
electron wave function to satisfy an equation like (5.2.1). On the other hand, we 
can try to use something like Eq. (5.2.2) to find the wave function of a bound 
electron, with Eq. (5.2.1) used only to interpret p as —ifV in E(p). Schrédinger 
thus took the equation for a bound electron as 


ih W0x,0) = E(-ifV, x)W(x,t), (5.2.4) 


where now E is given a dependence on x to account for the presence of potential 
energy. For a non-relativistic electron in a potential V (x) the energy is E(p,x) = 
mec? + p*/2me + V(x), and Eq. (5.2.4) reads 


a , h_, 
ih—w(x,t) = | mec’ — ——V~ + V(X) | W(x, fr). (5.2.5) 
ot 2Me 


This is known as the time-dependent Schrodinger equation. 
Because the potential is assumed time-independent, Eq. (5.2.5) has solutions 
of the form 


W(x, t) = exp[ — i(mec? + E)t/h|w(x), (5.2.6) 


where 


2 
2Me 


6 §, Schrodinger, Ann. Phys. 79, 361, 409 (1926). 
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This is known as the time-independent Schrédinger equation. It is interpreted as 
the condition for w(x) to represent a state with definite energy E, relative to the 
rest-mass energy mec”. 


Boundary Conditions 


This all began as just guesswork. Schr6dinger and other physicists at first imag- 
ined that w(x,?t) gives an indication of how much of the electron is near x at 
time t. As we will see when we come to scattering in Section 5.6, it was only 
a few years later that Max Born correctly interpreted | (x, t)|? as a probability 
density — that is, |w(x, t)|? d>x is the probability that the electron is in a small 
volume d?x at position x and time f. For the present, all we need to know is that 
the relevant solutions of Eq. (5.2.7) are those with 


[ivoora's < 00 (5.2.8) 


so that by dividing w(x) by the square root of this integral we obtain a normal- 
ized wave function, corresponding to a probability density for which the total 
probability of the electron being somewhere is 100%. 

If we assume (as is generally the case in practice) that V (x) vanishes at large 
distances |x|, then at large |x| Eq. (5.2.7) becomes 


2 
Ew (x) > ale W(x). (5.2.9) 
2Me 


A bound electron must have E < 0 (since otherwise it would be energetically 
possible for the electron to escape to infinite distance) so Eq. (5.2.9) has solu- 
tions that at large |x| behave as 


w(x) > P(x) exp(+k|x|) (5.2.10) 


where x is the positive square root, 


K = +)/2me|E|/h? , (5.2.11) 


and P(x) is some function such as a polynomial that varies much more slowly 
than an exponential for large |x|. (A derivative 0/dx; acting on the exponential 
in Eq. (5.2.10) yields a constant factor +, while a gradient acting on a function 
P(x) in Eq. (5.2.10) that grows as a power of |x| gives a factor for |x| — oo 
proportional to 1/|x|.) Solutions of the time-independent Schrédinger equation 
thus come in pairs, one of which (the one with a minus sign in the exponential 
in Eq. (5.2.10)) satisfies the condition (5.2.8) at least as far as convergence at 
large |x| is concerned, while the other does not. 

We shall see that there is also a smoothness condition on w(x) at x > 0 
that must be imposed on the wave function. We can always find solutions of 
the Schrddinger equation (5.2.7) that satisfy either this condition at x — 0 or 
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the condition that w(x) « exp(—x|x|) at x — oo, but we cannot impose both 
conditions except for certain discrete values of E. These are the allowed energy 
levels of the bound electron. Schrédinger was justly proud that the existence 
of discrete energy levels, and hence the existence of atomic spectra discovered 
over a century earlier, were now explained as a mathematical consequence of 
boundary conditions imposed on a wave equation, rather than Bohr’s ad hoc 
assumption of angular momentum quantization. 


Spherical Symmetry 


We now specialize to the case of spherical symmetry, which applies in one- 
electron atoms (and approximately for each electron in atoms with many elec- 
trons), for which the potential is only a function of r = |x|. There is a 
mathematical identity that is useful for a wide variety of problems with spherical 
symmetry in various branches of mathematical physics: 


re) peace 


r2 or or 


Vii 


+ 5 (x x V)? f (x) (5.2.12) 


where f(x) is an arbitrary differentiable function of position. (This can be 
derived in the same way that in ordinary vector algebra we derive the familiar 
identity (a x b)? = a2b* — (a- b)”, but here keeping track of the order of 
the position variable and derivatives that act on it.) As already mentioned, 
Schrédinger assumed that —ifiV should be interpreted as the operator rep- 
resenting the momentum, so the operator representing the orbital angular 
momentum is 


L=-ihxxV, (5.2.13) 
and we can write Eq. (5.2.12) as the identity 
1a Of (x) a) 
V* f(x) = —~—=<L ; 5.2.14 
f®) = 3 [i 5. al fe) (5.2.14) 


The time-independent Schr6édinger equation (5.2.7) thus takes the form 


h? 1d 7 50W(x) 1 12 ‘5 3 sae 
aa ee [i ap tax W(x) + Vir)W(x) = E(x). (5.2.15) 


Radial and Angular Wave Functions 


In order for the gradient operator in the Schrédinger equation to be well-defined, 
we need the wave function to be analytic in x, by which is meant that it can be 
expanded in a power series about any point, and in particular about the origin 
x = 0. As we shall see, this condition can be imposed on the wave function 
unless the potential is very singular. 
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Suppose that for some particular wave function, the smallest power of x in the 
expansion of the wave function around the origin is some integer £ = 0,1,2,... 
Then for x — 0 the wave function is dominated by a homogeneous polynomial 
of order @ in the coordinates x; — that is, a sum of terms each of which has ¢ 
factors of the coordinates x;. For instance, a homogeneous polynomial in x of 
order zero is a constant, a homogeneous polynomial in x of order one is a linear 
combination of x}, x2, and x3, and a homogeneous polynomial in x of order two 
is a linear combination of 


2 2 2 
X1> X95; X3, X1X2, X2X3, X3X] . 
Defining a radial coordinate r = |x| and a unit vector x = x/r, the wave 
function for r + 0 can now be written 
wx) > r'¥e(%) , (5.2.16) 


where Y¢ is some homogeneous polynomial of order £ in the unit vector x. (As 
we Shall see, for £ > 1 there is more than one such polynomial, which will later 
have to be distinguished by attaching an additional label to Y;.) 

Just knowing the value of £ is enough to tell us how L? acts on the wave 
function. Note that L does not act on functions of r, because L f(r) = —ih(x x 
& f'(r)) = 0, so L? acts only on the direction <. For a wave function that goes 
as (5.2.16) for r — 0, the first two terms on the left of Eq. (5.2.15) go as 


h? 19 a he 1 
= Pee at SO Oy 
2m, r2 or or 2 


é 


1 
L2 (—27 2y A : 
Imer2 w(x) ie, i. (x) 


while Ey and (as long as the potential does not blow up as fast as 1/r? for 
r — 0) also V(r) are negligible for r — 0 compared with r°~*. Hence the 
time-independent Schrodinger equation (5.2.15) requires that 


L7¥,(k) = A70(€ + 1)¥e(2) . (5.2.17) 
We can therefore find solutions of the Schrddinger equation (5.2.15) of the form 
W(x) = RO)Ye), (5.2.18) 

where R(r) satisfies the radial wave equation 


ama ae | at RO VRE = ERED, 


(5.2.19) 


with boundary conditions 


R(r)« r’ for r>0 R(r) « P(r)exp(—xr) for r> oO. 
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Here the term proportional to €(¢€ + 1) acts as a positive and hence repulsive 
potential arising from the centrifugal force acting on an electron with non-zero 
angular momentum. The function R(r) will turn out to depend on an index in 
addition to 2, on which the energy also depends. 


Angular Multiplicity 


As we will see when we come to the periodic table of elements, it is important to 
know the number of independent solutions of Eq. (5.2.17) for a given €. For this 
purpose, it is convenient first to recast Eq. (5.2.17) as a condition on r’¥o(X), 
a homogeneous polynomial of order @ in the three-vector x. From Eq. (5.2.14), 
we see that Eq. (5.2.17) is equivalent to the condition 


Vr’ YG) =0. (5.2.20) 


To distinguish among the solutions of Eq. (5.2.20), it is convenient to consider 
the action of the operator L3 = —if(x,0/0x2 — x20/0x,). Note that 


L3(x, £ix2) = —iA(—x2 £ix,) = th(xy tix2), 13x3=0. 


We can take a complete set of independent homogeneous polynomials in x of 
order ¢ as the products of vz factors of x; + ix2 and £ — v_ — v_ factors of x3 
for various non-negative integers vi. The action of L3 on these products is 


L3((x1 + ix2)"* ey — ixz)’ x meee) 
n tloy =v Ser bia a) a 


Now, for an arbitrary function f(x), 


a? af vf 

— (er) =ojh i4 ; 

- af) ‘ 0x1 0X2 “ ax? 

rf a? ; 

L 2ih L3 ; 
a 3f) = +2i eT + = ad 

: a7 f 

a (b3f) = L335, 

0x3 X3 


sO 
V7(L3f) = L3V"f . 


(We shall see in Section 5.4 that this is just a consequence of the rotational in- 
variance of the Laplacian V7.) It follows that when V7 acts on a sum of terms of 
the form (xj +ix2)"+ (x1 —ix2)""x3 ee , all with the same value of v4 —v_, 


it gives a sum of terms that again all have that value of v,—v_. We can therefore 
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find solutions of Eq. (5.2.20) that are sums of products of coordinates all with 
the same value of m = v4 — v_, and label the solutions as aos e (x), where 


L3Y}"(&) = hmY}"(8) . 


How many solutions of Eq. (5.2.20) are there for a given ¢? First let’s ask 
how many independent homogeneous polynomials in x of order ¢ of the form 
(ay + 0)” (xy = 1x9)", {+—"— there are for a given £. The exponent v+ 
can be any integer from 0 to £ and, for a given v,, the exponent v_ can be any 
integer from 0 to € — v, so the number of these independent homogeneous 
polynomials of order ¢ in x is therefore 


f-vi 
= wig yi @-4tD=e4?- a 
v4=0 v_=0 v4=0 


— @€+D(E+2) 

= aa al 
We also have to impose the condition (5.2.20). The function V(r! ¥(X)) is 
itself a homogeneous polynomial of order £ — 2 in the three-vector x, so setting 


this function equal to zero imposes Ne—2 conditions on r°Vp (x), and the number 
of independent Y¢ subject to these conditions is thus 


@+D¢+2) C=—1)é 


Ne — Ne-2 = = =2f+1. 
é ¢-2 5) 5) + 
But this is the same as the number of possible values of m = vi — v_, ranging 
from m = —£ tom = +£, so there must be just one function Y;” (x) for each £ 


and m. 

The index m does not appear in Eq. (5.2.19), so the 2¢ + 1 states that differ 
only in the value of m all have the same energy as long as the spherical sym- 
metry of the atom is maintained. The degeneracy of these states can be lifted 
by exposing the atom to an external perturbation which marks out a preferred 
direction, in which case the energies of the different states will be split from one 
another. Where the external perturbation is a magnetic field this is known as the 
Zeeman effect, after Pieter Zeeman (1865-1943), who first reported it in 1897. 
It was not possible to understand the details of the splitting of energy levels 
in the Zeeman effect until the discovery of electron spin, to be discussed in 
Section 5.4. We will calculate the Zeeman effect in Section 5.9, using the meth- 
ods of perturbation theory, as an application of the quantum theory of the inter- 
action of electrons with electromagnetic fields, to be described in Section 5.8. 


Spherical Harmonics 


Explicit formulas for spherical harmonics are needed in some applications of 
quantum mechanics but they are not needed in the calculations of energy levels 
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in one-electron atoms. Nevertheless, to make this discussion of angular depen- 
dence concrete, we give here all the spherical harmonics for ¢ = 0, € = 1, and 
£=2: 


1 
yo =,/—, 
Ar 
1 os ip 
Y, =—,/ —(@1 +ix2) = —,/ — sinée 
87 
3 3 
y? = —x3 = ,/—cos0, 
An 4a 


3 3 
= ie 
Vy 4] ae ~ +i) == > (sind)? a 
-/= ac + ix2)x%3 = —,/ = sind cos 6 e!? , 
a2 S 2 
yo = ie — (285 — 87 — $3) = | = —B(cos a)” — 1), 
7 15 15 ; 
Y, — ix2)x3 = a? cos 6 e!? 
a es PP i%2)° => > (sind)? « ae 


They are written in terms of the angles appearing in spherical polar coordinates: 
xj =rsind cosd?, x2 =rsiné sing, x3=rcosé. (5.2.21) 


The numerical factors have been chosen to make the spherical harmonics or- 
thonormal, in the sense that 


20 ca 
/ dd | sin 6 dO YI" *(0,b) Y"(0,0) = 800’Smm’- (5.2.22) 
0 0 


Hydrogenic Energy Levels 


Let’s now specialize further to the case of a one-electron atom, such as neutral 
hydrogen, singly ionized helium, etc., with nuclear charge Ze. The electrostatic 
potential felt by the electron is then the Coulomb potential V(r) = —Ze?/r. 
We seek a solution of the form (5.2.18), w(x) = R(r)Y;"(x), where R(r) is 
some function only of r. Then the radial wave equation (5.2.19) reads 
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2 2 2 
h’ 1 a [pao | EE + DA gy ZERO _ ere. 6.2.23) 
r 


eae a R 
2m, r2 dr or 2mer2 @) 


We can easily find a family of exact solutions of this equation. Recall that, 
according to the definition of 2, for r — 0 we have R(r) « r’, while as shown 
above, for r — oo the function R(r) goes as exp(—xr) times some function 
of r that grows more slowly than exp(xr). So let us try for a solution of the 
form 


R(r) = r° exp(—xr) . (5.2.24) 


The first term in Eq. (5.2.23) then contains a contribution in which both deriva- 
tives act on the exponential, which takes the form —(h?x? /2m-)R(r), which 
according to Eq. (5.2.11) just matches ER(r) on the right-hand side. It also 
contains a contribution in which both derivatives act on powers of r; this gives 
a contribution 


_Wexp(-«r) 1 a P=] — WEE + DRO) 


> 


2mM~ r2 or or 


which cancels the second term on the left-hand side of Eq. (5.2.23). The first 
term in Eq. (5.2.23) also contains contributions in which one derivative acts on 
exp(—«r), giving a factor —x, while the other derivative acts on a power of r, 
giving a factor [(€ + 2) + €]/r = 2(€ + 1)/r, so these contribution add up to 


2(€ + 1)h2K 
26+ Die 


2mer 


2mer2 


R(r). 


This remaining contribution must cancel the Coulomb term —Ze?R(r)/r, so 
the necessary and sufficient condition for a solution of the form (5.2.24) is 


€+1)h? 

(+ Dik = Fer. 
Me 

We conclude that these solutions have 


2,2 2,4 
E= _f'k* = pee ; (5.225) 
2mMe 2h (€ + 1)? 
which agrees with the Bohr formula (3.4.6) if we identify n with € + 1. 
This is not the only class of solutions. It is straightforward though tedious to 
show that, in addition to solutions of the form (5.2.24), there are more general 
solutions of the form 


wr) =r* Poy (r) exp(—xr) , (5.2.26) 


where Pe ,(r) is a polynomial of order v. Without actually constructing these 
polynomials, we can relate the energy to ¢ and v by considering Eq. (5.2.23) in 
the limit r — oo. In this limit, Eq. (5.2.26) gives 
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R(r) x r°*” exp(—Kr) . 


By repeating the arguments previously applied to the solution (5.2.24), but now 
only for r — oo, we see that the first term in Eq. (5.2.23) contains a contribution 
in which both derivatives act on the exponential, which matches the term ER 
on the right-hand side, and another contribution in which both derivatives act 
on powers of 7, which is negligible for r — oo, as is the centrifugal potential 
and terms in which one or two derivatives act on sub-leading terms in Pe ,(r). 
This leaves the potential term and the part of the first term in Eq. (5.2.23) in 
which one derivative acts on the exponential and the other on the leading power 
of r in Pe(r), which gives a contribution that cancels the potential term if 
(€+v+1)h7?«/me = Ze’, or in other words if E is given by the Bohr formula 


fe a (5.2.27) 
2Me 2n-n2 ° 
where now 
n=v+é+1, (5.2.28) 
with v the order of the polynomial in Eq. (5.2.26), and hence a non-negative 
integer. 


The positive-definite integer n > € + 1 defined by Eq. (5.2.28), on which 
the energy solely depends, is known as the principal quantum number. Spectro- 
scopists have developed a terminology, in which the letters s, p, d, f and so 
on stand for 2 = 0, 1, 2, 3, etc. A state is labeled first with n, and then with 
the letter indicating £, so in hydrogen the states are 1s, 25, 2p, 35, 3p, 3d, and 
so on. 

We can now work out the degeneracy of these energy levels. Since the energy 
depends only on n, according to Eq. (5.2.28) for each energy we can have ¢ 
equal to anything between ¢ = 0 and € = n—1 (for which respectively v = n—1 
and v = 0). We have seen that for each @ there are 20 + 1 states distinguished by 
different values of m. So, according to this reasoning, the total number of states 


for a given energy and hence a given value of 7 is 
a (n —1)n 
#, = y(2e+ 1) = 2——— +n =n’. (5.2.29) 


£=0 ? 
As we will see in Section 5.4, because electron spin has been left out in this 
calculation the degeneracy (5.2.29) is too small by a factor 2. 


5.3 General Principles of Quantum Mechanics 


As we have seen in the story so far, quantum mechanics began with guess- 
work: Einstein’s guess that the energy and momentum of light waves comes in 
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particles; Bohr’s guess that if the energy and momentum of radiation are quan- 
tized then so are other things, such as the angular momentum of electrons 
in atomic orbits; de Broglie’s guess that if electromagnetic waves consist of 
particles then particles such as electrons behave like waves; and Schrédinger’s 
guess that the differential equations for de Broglie’s waves could be modified for 
atomic electrons by inserting a potential. It is time that we move on from this 
guesswork and describe the general principles of quantum mechanics as they 
emerged in the formalism of wave mechanics soon after 1925. Then in following 
sections we shall go on to applications of quantum mechanics in contexts more 
general than those considered so far. 


States and Wave Functions 


The first general principle of wave mechanics is that physical states are repre- 
sented by wave functions, functions (x1, X2,...), with one coordinate argu- 
ment for each particle in the system (and, as we shall see in the next section, 
with the wave function depending also on the 3-component of each particle’s 
spin angular momentum). As anticipated in Section 5.2 for the case of single- 
particle wave functions, the probability in a state represented by wave function 
y that one particle is in a small volume d*x, around x1, another particle is in a 
small volume d?x7 around x2, and so on, is 


dP = |W(x1,X2,...)|" d°x1 d?x2-+- (5.3.1) 


Since with 100% probability the particles have to be somewhere, this requires 
the wave function to satisfy the normalization condition 


[Ox @n-em,..0P =1, (5.3.2) 


Two wave functions that differ only by a constant phase factor of absolute value 
unity represent the same state. In solving differential equations for the wave 
function, the important thing is that the integral (5.3.2) should be finite — in 
that case we can always find a w that satisfies Eq. (5.3.2) by dividing the wave 
function by the square root of this integral. 


Observables and Operators 


The second general principle of wave mechanics is that observable physical 
quantities are represented by linear operators on these wave functions, here 
generally distinguished by upper case letters. By an operator A being “linear” 
is meant that, for any pair of wave functions yy, and w2 and numbers a, and ap, 
we have 


Alayy + aay] = aj Ay + anAyr. (25.9) 
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As part of this principle, a state represented by a wave function y has a definite 
value a for the observable represented by an operator A if and only if 


Ay =ay. (5.3.4) 


In this case we say that y is an eigenfunction of A with eigenvalue a. 

For instance, the operator P,; that represents the jth component (with 
j = 1,2,3) of the momentum of the nth particle acts as —ifi times the partial 
derivative 0/0xX,; with respect to the jth component of the coordinate of the nth 
particle, acting on whatever function is to the right. This is clearly linear in the 
sense of Eq. (5.3.3). In order for a one-particle state to have a definite value p 
for the momentum of the particle, it is necessary that the wave function should 
satisfy an equation of the form (5.3.4), which in this case reads 


~ihVW(x) = py(x), 


which has as a solution the de Broglie wave function 
W(x) x exp(ip -x/A) . 


This raises a problem, which is endemic to values of observables that like 
momentum lie in a continuous spectrum of possible values: the integral (5.3.2) 
is infinite for a wave function of this form and therefore cannot be normalized. 
But we can find a wave function that is arbitrarily close to this form for an 
arbitrarily large range of position: 


w(x) = (wL)~?/? exp(—x*/2L”) exp(ip - x/h) , 


in which the constant factor (2 L)~3/ 2 is chosen so that this w satisfies the 


normalization condition (5.3.2). The constant L can be chosen as some very 
large length, in which case the particle is almost certainly in a very large volume 
L3, where it almost certainly has the momentum p. 

The operator X,,; that represents the jth component of the position vector of 
the nth particle, acting on any function of position to its right such as a wave 
function or the derivative of a wave function, simply multiplies that function by 
the argument x,,; and is obviously linear. Here again we have the problem that its 
eigenfunctions cannot be normalized. In the one-particle case, a wave function 
w(x) that represents a state with a definite position a would have Xw(x) = 
xv (x) equal to aw (x) for all x, so that it would have to vanish for all x ~ a, and 
the integral (5.3.2) would vanish. But we can find a normalized wave function 
that represents a state in which the particle is almost certainly very close to 
position a: 


w(x) = (d)~3/? exp ( — (x — a)?/2d”) , 


where d is here some very small length. 


5.3 General Principles of Quantum Mechanics 141 


From operators we can construct other operators, which may or may not 
represent physical quantities. Linear combinations provide a trivial example: 
if A and B are operators while a and b are ordinary complex numbers, then 
aA + bB is an operator for which 


[aA +bBlw =aAw +bBy. 


The product AB of any two operators A and B is defined by associativity: it is 
an operator that, acting on any function f to its right, gives the same result as 
acting first with B and then acting to the right on Bf with A: 


(AB) f = A(Bf). (5.3.5) 


The Hamiltonian 


One linear operator formed in this way is the Hamiltonian, which represents the 
energy. For instance, for a single non-relativistic particle moving in a potential 
V the Hamiltonian is 


1 
H = —P?+V(X). (5.3.6) 
2m 


The time-independent Schrédinger equation (5.2.7) is just the statement 
Hw = Ew, which tells us that y represents a state with energy E. The eigen- 
functions of this Hamiltonian with negative eigenvalues are normalizable, a con- 
dition we imposed in finding the bound state energy values in Section 5.2, but 
there are also eigenfunctions with positive eigenvalues, representing unbound 
states, which can only be normalized in the same approximate sense as the 
eigenfunctions of position and momentum. 


Adjoints 


There is another process for producing new operators from other operators, 
analogous to taking the complex conjugate of a number. For any operator A, 
we define the adjoint A‘ as the operator for which 


/ [AviI*ys = / Wi[Atya] (5.3.7) 


where v and yy are any two wave functions. Here and below we use the 
abbreviation 


[otis [Ox fbr Weim,.. Water...) . (5.3.8) 


It is easy to see that the adjoint of a product is the product of the adjoints in the 
opposite order: 


[AB]' = Brat (5.3.9) 
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because 
i [ABYil* yo = / [By F[At yo] = / WiLBTAT YW]. 


It is also obvious that the adjoint of a linear combination of operators is the same 
linear combination of the adjoints, but with complex conjugate coefficients 


[aA + bDB]i =a*A' +b*B', (5.3.10) 
and the adjoint of an adjoint gives back the original operator: 
[ATi =A. (5.3.11) 
There is an important class of linear operators that are their own adjoints 
A‘=A. (5.3.12) 


A physical quantity represented by such an operator can only have real values 
in any state, for if AW = aw for some wave function w, then 


af vy = / LAW] = fiany =a" / vty 


so a = a*. Such operators are called self-adjoint, or Hermitian. The coordinate 
operator X, is obviously self-adjoint, and the momentum operator P,,; is too, 
because the minus sign produced by taking the complex conjugate of —i is 
cancelled by the minus sign produced by integration by parts: 


[trons = [t-invivit'ys =+ih [ivnivite 
= <in f viLvaivel = f vitPaival : 


Assuming the potential to be real, the Hamiltonian (5.3.6) is also self-adjoint, 
so the allowed energy values are all real. Also, all components of the angular 
momentum operator L = X x P for a single particle are self-adjoint. For 
instance 


Li = (XP) — X2P\)' = P)X\ — P\X2 = Xi Po — XoP, 


the last step being valid because P does not act on x; and P; does not act on 
x2; both only act on whatever function of x that L3 is acting on. Likewise of 
course for L; and Lo. 

Self-adjoint operators have another important property. If A is self-adjoint 
and Ay = a, Wy and Ay2 = az, then 


‘i / Ve = / WAV = / ahi =e / win 


so if ay # a2 then i W351 = 0. Such wave functions are said to be orthogo- 
nal. For instance any two different spherical harmonics such as those listed in 
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Section 5.2 are orthogonal, because they are eigenfunctions of the self-adjoint 
operators L? and L3 with different eigenvalues h7€(€ + 1) and/or hm. 


Expectation Values 


The interpretation given by Eq. (5.3.1) of |y|? as a probability density tells us 
that if we measure any function f(x ,,X2,...) of positions many times in the 
state represented by wave function w, the mean value of the measured values 
will be 


(fly = f fo1.x... IW(x1,%2,...)/?d?xy d3x2--- , 


provided that yw is normalized so that f |W (x1, X2,...)|2d3.xq d?xo-+) = 1. 
Since w(x,,X2,...) 1S an eigenfunction of the operator f(X,,X2,...) with 
eigenvalue f(x;,X2,...), this can be written 


(fy = fv oc x2.. [FOL Xo. WAH D3] 


or, in our abbreviated notation, 


he = fvirn, 


where F is the operator f(X1, Xo,...). It is only a short step from this to 
a third postulate of quantum mechanics, which states that when any physical 
quantity represented by an operator A is measured many times, each time in the 
state represented by normalized wave function y, then the average value found 
for this quantity is 


(A)y = fwvan (5.3.13) 
or, if the wave function is not normalized, 
{way 
A)y = —~————_.. 
A Tb 


This is called the expectation value of A for the wave function w. For a self- 
adjoint operator 


/ WIAW = / [AvI'y = ( / vlan) 


so the expectation value of a self-adjoint operator is real for any wave function. 

It is obvious that if Aw = ay then the expectation value (A)y of A for 
the wave function yy is just a, but expectation values give useful information 
even for wave functions that do not represent states with a definite value for 
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the observable. For instance, the mean square spread of values of an observable 
represented by A around its mean value is 


(AA)? = ((A — (A))’) . (5.3.14) 


Probabilities 


Suppose a physical system is in a state represented by a normalized wave func- 
tion w, and we measure an observable represented by a Hermitian operator A 
that (to start with the simplest case) has only discrete non-degenerate eigenval- 
ues @, With eigenfunctions g,. Even though w will not in general be one of these 
eigenfunctions, it can generally be expanded as a series of terms proportional to 


the eigenfunctions 
y= ye CnPn >» 
n 


where c, are some numerical coefficients. (The proof of the possibility of such 
an expansion depends on detailed properties of the operator A.) As we have 
seen, such eigenfunctions are orthogonal and, if properly normalized, can be 
taken as orthonormal in the sense that 


1 n=m 
| 00 = bom = | 0 ném. 


We can find the coefficients c» by multiplying the expansion with g* and 
integrating over all values of the arguments, which gives 


[on — pa i Pn Pn = Cm - 
n 


The expectation value of the observable A is then 


(A)y = f vray = cen f o2Aem 


nm 


or, since AY» = AnYm; 


(A)y = > ene i PnOm = damien! ~ doen 
m 


nm 


2 


/ Pn¥ 


Since a corresponding result is true for any function of A, the inevitable inter- 
pretation is that when the observable represented by A is measured in a state 
represented by the normalized wave function w, the probability of finding the 
result a, iS 
2 

(5.3.15) 


Pl) = [ew 
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This is known as the Born rule and can be taken instead of Eq. (5.3.13) as the 
third postulate of quantum mechanics. 


Continuum Limit 


We can also calculate probability densities for an observable that takes a con- 
tinuum of values, by taking the limit of the case in which the observable takes a 
very large number of very close discrete values. If the number of values of the 
index n for which the eigenvalue a, is in a range from a to a + da is N(a)da, 
then in the state represented by normalized wave function wy the probability of 
finding the observable in this range is 

/ Ya 


where @, is any normalized eigenfunction of A with eigenvalue in this narrow 
range. For any such observable, instead of working with the conventional wave 
functions y (x1, X2,...) we can use wave functions 


Wy (a) =/N@ | vv (5.3.17) 


for which Eq. (5.3.16) gives the probability of finding the observable in the 
range from @ to a + da: 


2 


dP(a) =N(a)da x : (5.3.16) 


dP(a) =|W(a)|" da. (5.3.18) 


The classic example of such continuum operators and alternative wave functions 
is provided by momentum. 


Momentum Space 


Consider for instance a particle in a cubical box of edge L. The normalized 
wave function representing a state with definite momentum p is 


Qp(x) = Le exp(ip-x/f) . 


Pretty much as we saw for photons in Section 3.2, the allowed momenta take 
the form p = 2znfi/L, where n is a vector with integer components, so that 
this wave function should have the same values on opposite sides of the box. 
In a state represented by a normalized wave function w(x), the probability of 
finding the momentum to have value p is 

2 

Pr. = 


> 


[ Pry x) dx 


the integral here taken over the interior of the box. We can pass to the continuum 
limit by taking the box to be very large, so that the allowed momentum values 
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are very close together. Since the allowed vectors n form a lattice of cubes, each 
of volume unity, the number dN of these allowed momenta in a small volume of 
momentum space d? p around p equals the corresponding volume in the space 
of vectors n: 


3 
ede. dp 
27h 


so the probability of finding the momentum in this range is 
— x 
anh) “ P 


=|W(p)Pa3p, (5.3.19) 
I L a * 3 
Ym=(<5) [ epovea's 


[ Poy x) dx 


where 


—> (Qnh)3/? / exp(—ip- x/h)W(x) d>x , (5.3.20) 


with the last integral taken over all space. We can just as well say that the state 
of the system is represented by the momentum-space wave function wp) as by 
the coordinate-space wave function w (x). Indeed, as we will see in Section 5.10, 
both the coordinate-space wave function w and the momentum-space wave 
function yy! are nothing but the components in different bases of a vector in 
an abstract space, known as Hilbert space. 


Commutation Relations 


The commutator of two operators A and B, written [A, B], is defined by 
[A,B] =AB—BA. (5.3.21) 


In order for the physical quantities represented by operators A and B to have 
definite numerical values a and f in a state represented by wave function y it 
is necessary that the commutator [A, B] acting on w should vanish, because 


[A, Bly = BAW —aBy = pay — aby =0. 


In particular, it is never possible for any state to have definite values for both of 
two quantities represented by operators whose commutator is simply a non-zero 
number, because such a commutator can never give zero when acting on any wy. 

It is helpful in evaluating commutators to note that commutation acts like 
differentiation. For instance, 


[A, BC] = ABC — BCA = ABC — BAC + BAC — BCA 
=[A, B]C + B[A,C]. 
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Thus, since all components of momentum commute with one another, they also 
commute with any function only of momenta, such as the total kinetic energy 
operator )~, P2 /2my. 


Uncertainty Principle 


Note that the commutator of X,; and P,,; acting on any multi-particle wave 
function y is 
aw 


a 2 
[Xni, Pnjlw = —1AXni —— + th Oni) 
OXmj OXmj 


0 0 
= ee i indi jonmW a ee = +ihdijonmw > 


Xmj Xmj 
a result we write as a commutation relation 
[Xni. Pj) = ihd;j;bnm . (5.3.22) 


This shows in particular that there can be no state in which a component of some 
particle’s position and the same component of the same particle’s momentum 
both have definite values. 

Indeed, using this commutation relation, it is possible to set a lower bound 
on the product of the root mean square spread of values of position and 
momentum: 


AXni APyi > h/2 (5.3.23) 


a result known as the Heisenberg uncertainty principle.’ 

We can see this in a simple example. The normalized wave function for a 
particle confined to a distance d around some position a can be written as a 
superposition of wave functions with definite momentum: 


(xd)~>/? exp ( —(x- a)” /2d°) 


3/2 
= (<a) i d° p exp (ip - [x — al/h) exp(—d7p?/2h”) . 
(5.3.24) 


We see that if the spread in values of x is of order d, then the spread in values 
of p is of order #/d, and the product of the spreads is of order fi, in accordance 
with the uncertainty principle. 


7 W. Heisenberg, Zeit. Phys. 43, 172 (1927). For a textbook proof, see Weinberg, Lectures on Quantum 
Mechanics, listed in the bibliography. 
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Time Dependence 


In the earliest formulation of quantum mechanics, wave functions were given a 
time dependence governed by the time-dependent Schr6ddinger equation: 


a 
ih wW(X1,X2,---51) = Hy (x1,X2,...31) (5.3.25) 


where H is the Hamiltonian operator, representing the energy of the system. 
The wave function of a state with a definite energy E thus has a trivial time- 
dependence, contained in a phase factor exp(—i Et/f). The expectation value 
(5.3.13) of any operator for such a wave function is independent of time. More 
generally, assuming that the Hamiltonian is self-adjoint, the time dependence 
of the expectation value of an observable represented by an operator A in a 
state represented by a normalized wave function y that satisfies Eq. (5.3.25) is 
governed by the differential equation 


d 
Face) =f wicimarnns fcimavrian 


= (-i/h) | wiaaw— foray] 
and therefore 


in“ (a) = ([A, H]). (5.3.26) 
dt 
In particular, the normalization integral { y*w is the expectation value (5.3.13) 
of the unit operator, which acting on any wave function just gives the same wave 
function. Since this operator commutes with the Hamiltonian (or anything else), 
the normalization integral is constant in time; once normalized, wave functions 
remain normalized. 
For instance, in the case of a single particle moving in an external potential, 
with Hamiltonian (5.3.6), 


1 2 
H = —P*+ V(X), 
2m 
we have 
1 > ih 

2m m 

and 
[P, 7] = [P, V(X)] = —iAVV(X). (5.3.28) 


The equations of motion of the expectation values are then 


d 


d 
ae =(P)/m, —(P)=-(VV(X). (5.3.29) 


dt 
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This is much the same as in classical physics, but note that (VV (X)) is not the 
same as VV ((X)), so this is not a closed set of equations. 


Conservation Laws 


Operators that commute with the Hamiltonian deserve special attention, in part 
because their expectation values (and the expectation values of any functions 
of them) are time-independent for any wave function. These represent what are 
called conserved quantities. Among these operators is of course H itself, so the 
mean energy (#7) of any state is constant in time. The momenta of particles 
moving in an external potential is not conserved, but the total momentum of a 
number of particles is conserved if the potential depends only on the differences 
of their coordinates. For instance, for such a two-particle system, 


[Pi + Po, V(X1 — X2)] = —iAVi V(X — X2) — iAV2V (KX) — X2) = 0. 


What about angular momentum? For simplicity, consider just a single particle, 
whose orbital angular momentum is L = X x P, in a potential that depends 
only on R = VX2. It is straightforward to work out that the commutators of a 
general linear combination e - L of the components of L with the position and 
momentum operators are 
[le-L, X]=—ihexX, (5.3.30) 
[le-L, P]|=—ihexP. (5.3.31) 


For instance, 
le- L, Xj] 


= [(e1(X2P3 — X3P2) + e2(X3P) — X1P3) + €3(X1 P2 — X2P1)), X1] 
—ih(e.X3 — e3X2) = —ih(e x X), . 


It follows that each component of L commutes with P? and X?, and hence also 
with V(VX2), and so with the Hamiltonian. 

Another reason for us to give special attention to operators that commute with 
the Hamiltonian is that states with a given energy can be classified according 
to the eigenvalues of these conserved quantities. For instance, for a Coulomb 
potential the states with a given principal quantum number v and hence with a 
given energy can be classified according to the eigenvalues of L? and L3, both 
of which commute with the Hamiltonian as well as with each other. Of course, 
L; and L» also commute with the Hamiltonian and with L?, but as we shall 
see in the next section they do not commute with each other, or with L3, so the 
best we can do is to classify states according to the eigenvalues of L? and L3 as 
well as H. 
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Heisenberg and Schrédinger Pictures 


The formalism described here, in which the wave function depends on time but 
operators are time-independent, is known as the Schrodinger picture. There is 
another formalism, known as the Heisenberg picture, in which wave functions 
are time-independent and operators depend on time. To make clear the relation 
between these pictures, just for the present we will use subscripts H and S to 
distinguish wave functions and operators in the Heisenberg and Schrédinger 
pictures. 
The time-dependent Schrédinger equation (5.3.25) has a formal solution 


Us(t) = e144 We); 


so, if we define the Heisenberg-picture wave function as the Schrédinger-picture 
wave function at zero time, 


Vu = ¥s(0) (5.3.32) 
then wave functions in the two pictures are related by 
Ws(t) =e Aap (5.3.33) 


In the Heisenberg picture, in order to preserve Eq. (5.3.26) we must give oper- 
ators a time dependence with 


in Ault) = [An(t), 7]. (5.3.34) 


The commutators of position and momentum with the Hamiltonian given in 
Eqs. (5.3.27) and (5.3.28) show that Eq. (5.3.34) gives these operators in the 
Heisenberg picture the same time dependence as the corresponding quantities 
in classical mechanics. To satisfy Eq. (5.3.34), we define the Heisenberg-picture 
operator Ay(f) in terms of the Schrédinger-picture operator As representing the 
same observable by 


A(t) = Ft Age tAt/h (5.3.35) 


We can go back and forth between the two pictures. For instance, for an arbitrary 
operator A and wave function y, the Schrédinger-picture wave function corre- 
sponding to the Heisenberg-picture operator Ay(t) acting on the Heisenberg- 
picture wave function Wy is 


e FAA Aung = Age tH t/F ips = AswWs(t) ’ 


just as if we had worked from the beginning in the Schrédinger picture. 

The Heisenberg picture and Schrédinger picture are physically equivalent, 
but useful in different contexts. The Schrédinger picture is more naturally used 
in calculating bound state energies, and, as we shall see in Section 5.6, it can 
also be used for scattering processes. The Heisenberg picture is invaluable when 
we want to use known equations of motion for observables to motivate a choice 
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of Hamiltonian, as we will do in Section 5.8. Also, in field theories where 
observables depend on position, in order to preserve the appearance of Lorentz 
invariance it is necessary to work in the Heisenberg picture, so that these 
observables will depend on time as well as on space coordinates. 


5.4 Spin and Orbital Angular Momentum 


Spin Discovered 


The counting of states described in Section 5.2 was already known in 1925 to 
be in conflict with spectroscopic data. The problem emerged most clearly in the 
study of alkali metals. These are elements such as lithium, sodium, potassium, 
etc. that were known to readily lose a single electron.® In the contemporary 
atomic models of the time, this meant that an alkali metal atom has one loosely 
bound electron outside inner shells of more tightly bound electrons. The poten- 
tial felt by this outer electron is spherically symmetric but it is not a Coulomb 
potential, which would be proportional to 1/r; so because L3 and L* commute 
with each other and with H it was still expected that states of definite energy 
would also have definite values hee +1) for L? and 2¢ + 1 states of equal 
energy for any given £, distinguished by different eigenvalues fim of L3, but no 
further degeneracy was expected. States could still be labeled with a principal 
quantum number n, defined so that the number of nodes of the wave function 
(values of r where the wave function vanishes) is n — £ — 1, as it is in hydrogen, 
but, unlike the case of hydrogen, here the energies depend on @ as well as on n. 
There is a very well studied “D-line” in the spectrum of sodium vapor (which 
gives sodium vapor lamps their orange color) with wavelength about 5890 
angstroms, interpreted as a 3p — 3s transition between states of the outermost 
electron with n = 3. But even with moderate resolution, spectroscopists were 
able to see that this line was doubled, having two components with wavelengths 
5896 angstroms and 5890 angstroms. Wolfgang Pauli (1900-1958) was led 
to suggest that, on the basis of this and other data, there is a fourth quantum 
number, besides n, £, and m, which takes just two values in all states with @ > 1. 
But the physical significance of this quantum number was at first mysterious. 
Then in 1925 two young Dutch physicists, Samuel Goudsmit (1902-1978) 
and George Uhlenbeck (1900-1988), suggested? that the extra quantum number 


With the charge of the electron and the atomic weights of these elements known, it could be concluded 
from the ratio of the metal mass produced in electrolysis to the electric charge used that one electron is 
needed to convert one ion in a solution of the metal salt to an alkali metal atom, so the atom in becoming 
an ion had to lose just one electron. This was in contrast with metals like beryllium, magnesium, calcium, 
etc., which require two electrons to convert an ion to an atom. 

9 S. Goudsmit and G. Uhlenbeck, Naturwiss. 13, 953 (1925). 
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was associated with an internal angular momentum, or spin, of the electron. At 
first this idea seemed absurd. If the spin S is anything like L, then any one 
component of S should take 2s + | values, running by unit steps from —fs up 
to +hs, where s is given by S* = h7s(s + 1). But to have 2s + 1 = 2 we need 
s = 1/2, while @ is always an integer. 

The notion of spin s = 1/2 was not understood until physicists adopted a 
more mature view of the nature of angular momentum, that it is an operator 
whose existence and properties are dictated by the invariance of the laws of 
nature under rotations, rather than by experience with classical spinning bodies. 
This takes some explanation. 


Rotations 
In general, an infinitesimal rotation changes any vector v by an amount 
é6v=exv (5.4.1) 


where e is an infinitesimal 3-vector characterizing the rotation. This is a rotation, 
because it leaves all scalar products unchanged: 


é(v-W)=v-(exv)+(exv)-v=0. (5.4.2) 


It is in fact (though we don’t need to know this here) a rotation by an in- 
finitesimal angle of |e| radians counterclockwise around the direction of e. For 
instance, if e is in the 3-direction then (5.4.1) gives 


dvj =—lelvz, dvg=+lelv;, 6v3=0. 


Now, suppose that one observer sees that a physical system is in a state 
represented by a wave function y, and suppose a second observer views the 
same state using coordinate axes that have been subjected to a slight rotation, 
which changes any vector v by an infinitesimal amount e x v. What does she 
see? For e infinitesimal the change in the wave function must be linear in e, and 
can therefore be written 


dw = (i/h)e-Jw (5.4.3) 


where J is some triplet of operators and the factor (/f) is inserted for future 
convenience. We would not want the rotation to change the total probability 
| w*w = 1 that the particles in the system are somewhere, so we require that 


0-6 [vy = [csme 7 jure + | waime Jy 


= (ijn) [ wre (+ dy 
and therefore J must be self-adjoint; 


Ji=J. (5.4.4) 
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(We are here using the abbreviation (5.3.8): 
/ Vas = / d°xy / dP xy Wa (K1,X2,.-.) Wo (X1, X2, ++.) 


and the definition (5.3.7) of the adjoint A‘ of any operator A, except that as we 
shall see we must include discrete variables along with coordinates.) 

In order for the transformation of the wave function to correspond to a rota- 
tion, it is necessary that it should produce a rotation of expectation values. That 
is, if V is an operator representing an observable that transforms as a vector 
under the general infinitesimal rotation (5.4.1), we must have 


d6(V) =ex (V). (5.4.5) 
From Eqs. (5.4.3) and (5.4.4) we see that 


sf wivw = [icisme-swrve+ f vvCilme- tv 


= /n) | yrte-t.vWW 
and therefore we require 
[e-J,V] =-—ihex V. (5.4.6) 


The same reasoning shows that J commutes with any rotationally invariant oper- 
ator. For instance, for any pair of vector operators V and V’, it is a consequence 
of Eq. (5.4.6) that 


[e-J, V- VW] = —if([e x V]-V'+V-[ex V’]) =0, 


just as in Eq. (5.4.2). In particular, as long as the Hamiltonian is rotationally 
invariant it commutes with J, 


[J,H]=0, (5.4.7) 


so, according to Eq. (5.3.26) angular momentum is conserved, in the sense that 
any expectation value of J is time-independent. 

The requirement that the product e- J in Eq. (5.4.3) should not depend on the 
orientation of the coordinate axes implies that the operator J is itself a vector 
and hence satisfies Eq. (5.4.6); 


[e-J,J]) =—-ihexJ (5.4.8) 


for any e. From the coefficients of the different components of e in this equation, 
we easily find the equivalent commutation relations: 


[i, 2) =ihd3, [J2, 2) =ihl, [43,4] =ihh . (5.4.9) 
(For instance, the 2-component of Eq. (5.4.8) is 
le- J, J2] = —ih (e x J)2 = —ih [63,1 — e1 J], 
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in which the coefficient of e; is [J,, Jo] = ihJ3.) Also, J ? like any other scalar 
commutes with J: 


[Ji,J7] =0. (5.4.10) 


As we shall see, it is the commutation relations (5.4.9) that determine the pos- 
sible values of J* and the possible values of J3 for a given J’. 


Spin and Orbital Angular Momenta 


The discussion so far in this section may produce some sense of déja vu. We 
saw in Eqs. (5.3.30) and (5.3.31) that the orbital angular momentum L has 
commutators just like (5.4.6) with coordinates and momenta and hence with 
any vector V formed from coordinates and momenta: 


[e-L,V] =—ihexV. (5.4.11) 


Since L is itself a vector formed from coordinates and momenta, this also 
applies with V = L, and hence 


[L1,L2]=ihL3, [L2,L3])=ihL,, [L3,Li1] =ihL2, (5.4.12) 


just like the commutators of the components of J. Of course, we can also 
calculate these commutators directly from the commutators of momentum and 
position operators. For instance, 


[L1, L2] = [X2P3 — X3 Po, X3P| — X1 P3] 
= Xo P\[P3, X3] + PoX1[X3, P3] 
= 1h(—X2P, + PoX1) =ihL3. 


But this does not mean that J = L. Instead, we can consider the possibility 
that 


J=L+S, (5.4.13) 


where S, known as the spin, is some operator whose properties we will now 
work out. 

First, because J satisfies Eq. (5.4.6) for any vector operator V, and L satisfies 
Eq. (5.4.11) for any vector operator V formed from positions and momenta, the 
difference of these equations tells us that 


[S:, Vj] =0 (5.4.14) 


for any vector operator V formed from positions and momenta. The spin opera- 
tor has nothing to do with positions and momenta. 
In particular, since L is a vector formed from positions and momenta, 


Site: (5.4.15) 
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It follows that 
Ji, Jj] = (Li, Lj] + (Si, Sj] 


so the S; satisfy the same commutation relations with each other as in Eq. (5.4.9) 
for J and Eq. (5.4.12) for L: 


[S1, So] = ihS3, [S2,53] =ihS,, [S3, 51] = ihS2 . (5.4.16) 


Multiplets 


We next show how to use the commutation relations (5.4.9) to find the allowed 
values of J* and the range of allowed values of J3 for a given J*. Though 
presented here for the total angular momentum J, precisely the same reason- 
ing and corresponding results apply to any angular momentum operators with 
corresponding commutation relations, such as the orbital angular momentum 
vector L that satisfies Eq. (5.4.12) and the spin angular momentum vector S that 
satisfies Eq. (5.4.16). 

First, we note that 


[3, (J) £iJo)] = ith ti (-iAJ)) = th(N tid). (5.4.17) 


Therefore J) + iJ2 act as raising and lowering operators: for a wave function 
w” that satisfies the eigenvalue condition Jz” = hmy’ (with any m), we 
have 


J3 (Ji Lid) w™ = (m+ I)h(J £idn)W" , 


so if (J; + iJ2)w’” does not vanish then it is an eigenfunction of J3 with 
eigenvalue Ai(m + 1). Since J’ commutes with J3, we can choose yw” to be 
an eigenfunction of J. as well as J3, and, since FJ? commutes with (J, tid), 
all the wave functions that are connected with each other by lowering and/or 
raising operators will have the same eigenvalue for J*. We say that such wave 
functions form an angular momentum multiplet. 

Now, there must be a maximum and a minimum to the eigenvalues of J3 that 
can be reached in this way, because the square of any eigenvalue of J3 is nec- 
essarily not more than the eigenvalue of J*. The reason is that for any wave 
function 7 that has an eigenvalue a for J3 and an eigenvalue b for J’, we have 


b—a’ = (FP — BD) = (7 + HD) = 0. 


It is conventional to define a quantity j as the maximum value of the eigenvalues 
of J3/f for a particular multiplet of wave functions that are related by raising 
and lowering operators. We will also temporarily define j’ as the minimum 
eigenvalue of J3/f for these wave functions. The wave function y/ for which 
J3 takes its maximum eigenvalue / j must satisfy 


(i tid)! =0, (5.4.18) 
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since otherwise (Jj +i J2) wi would be a wave function with a larger eigenvalue 
of J3. Likewise, acting on the wave function wi with (J; — iJ2) gives an 
eigenfunction of J3 with eigenvalue fi(j — 1), unless of course this wave function 
vanishes. Continuing in this way, we must eventually get to a wave function y/ 
with the minimum eigenvalue fj’ of J3, which satisfies 


(I) —id)W =0, (5.4.19) 


since otherwise (Jj — i Ja) wi ‘ would be a wave function with an even smaller 
eigenvalue of J3. We get to y/ from w/ by applying the lowering operator 
(J; — iJz) a whole number of times, so j — j’ must be a whole number. 

To go further, we use the commutation relations of J; and Jz to show that 


(h-ih\(At+ib)=F+ BR +i, bl=P-R-hh, (5.4.20) 
(tihlA -ih)= 7+ RB -ih, bl=P-BR+hR. (6.4.21) 
According to Eq. (5.4.18), the operator (5.4.20) gives zero when acting on 

y/, so 
Pwi=VWiG+vw. (5.4.22) 


On the other hand, according to Eq. (5.4.19) the operator (5.4.21) gives zero 
when acting on y/ , so 


Pu ar ii). (5.4.23) 


But all these wave functions are eigenfunctions of J* with the same eigenvalue, 
so j'(j’ — 1) = j(j + 1). This quadratic equation for j’ has two solutions, 
j’ = j +1, and j’ = —j. The first solution is impossible, because j’ is the 
minimum eigenvalue of J3/f and therefore cannot be greater than the maximum 
eigenvalue j. This leaves us with the other solution, 


j=-j. (5.4.24) 


But we saw that 7 — j’ = 2j must be a non-negative integer, so j must be a 
non-negative integer or half integer. The eigenvalues of J3 range over the 27 +1 
values of fim with m running by unit steps from —j to +. The corresponding 
eigenfunctions will be denoted yi so that 


ay) =hmy}, m=—j,-j+l,..., +i (5.4.25) 
Pyt aw jit pyr. (5.4.26) 


These are the same eigenvalues as those we found in the previous section in 
the case of orbital angular momentum, with the one big difference that j and m 
may be half-integers rather than integers. This justifies the guess of Goudsmit 
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and Uhlenbeck that electrons could have an intrinsic angular momentum with 
j = 1/2, but that is the end of the surprises — we see that it is not possible to have 
physical systems with weird angular momenta such as j = 1/3, j = 1/4, etc. 

Using these results, we can work out the action of any component of J on 
these multiplets. Because J; — iJ is a lowering operator, we must have 


(Si —id2)o =ajmyi", 


where the @ jm are various constants that depend on how the wave functions are 
m 


normalized. If we assume that w ; 


Eq. (5.4.21), 


and ve both have unit norm, then, using 


lor jmI? = i wi (I +ib)(A — ib)" = / wi (PIB + ads)? 


=A [j(j+1)—m(m—1)]. 


We can adjust the phases of these states so that all aj, are real and positive and 
so that o& jm = h/j(j + 1) — m(m — 1); hence 


( -id)yw? =hV 7G +1) — mim - yr. (5.4.27) 
The same analysis shows that, with this choice of phases, 
(i tidy? =h/iG+)—man+ Dye, (5.4.28) 


So now we know how J; and Jz as well as J3 act on angular momentum 
multiplets. 

A particle of species n with eigenvalue f7s,(s, +1) for S? is said to have spin 
Sy. Electrons, muons, neutrinos, and quarks have spin 1/2; W and Z particles 
have spin 1; and Higgs particles have spin 0. The concept of spin is not limited 
to so-called elementary particles. Protons and neutrons are each composites of 
three quarks, and some of their intrinsic angular momentum comes from the 
orbital motion of these quarks, but the energies in an atomic nucleus are not 
high enough for us to probe the internal structure of the proton and neutron, and 
so we refer to their total angular momentum as spin 1/2. Likewise, the energies 
in an atom are not high enough for us to probe the internal structure of their 
nuclei, and so we refer to the intrinsic angular momentum of these nuclei as a 
spin. The deuteron has spin 1; the *He and *He nuclei have spins 1/2 and 0, 
respectively; and so on. 

The wave function for a multi-particle state depends on the 3-components o,, 
of the individual spin vectors S,,/h, so the wave function must be labeled by 
these spin 3-components as well as by coordinates and will be written 


W(Ri, 01) Mo, 00) si2). 
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The values of o, run over the 2s, + 1 values from —s, to +s,. In place of 
Eq. (5.3.8), the scalar product of two wave functions for systems of particles 
with spin includes a sum over all these o;,: 


[vim 
= [PD fbx vg eoie. 02. Worm.) 
Oo} 02 
(5.4.29) 


The spin operator S does not act on the coordinate arguments, but produces 
linear combinations of wave functions with various values of the o,,. (Instead of 
the 3-component of angular momentum, we can label states with the helicity, 
the component of angular momentum in the direction of motion in units of fi. 
Photon states have only helicity +1, corresponding to the two states of circular 
polarization.) 


Adding Angular Momenta 


Physical systems typically involve angular momenta of various sorts. Even in 
hydrogen there are the orbital and spin angular momenta of the electron, and 
also a proton spin very weakly coupled to the electron. In more complicated 
atoms there is more than one orbital and spin electron angular momentum, as 
well as a nuclear spin. But rotational invariance only ensures that the states of 
definite energy can also be chosen to have definite values for J* and J3 (where J 
is the total angular momentum), which are required by rotational invariance to 
commute with the Hamiltonian. This is one reason why it is important to know 
what total angular momenta arise when we combine different angular momenta 
in the same system. 

Suppose a system involves two different angular momenta J, and J,. These 
may be the spin and/or orbital angular momenta of a single particle, or sums 
of various spins and/or angular momenta of a number of particles. Suppose the 
wave function is an eigenfunction of A and I with eigenvalues h? jy (ja + 1) 
and h? j,(jp + 1), respectively. We can define wave functions wie 
functions also of Jg3 and Jp3 with eigenvalues fim, and hmp, respectively. 
We then have the following problems: what linear combinations of these wave 
functions are eigenfunctions of J’ and J3 (where J = J, + Jp) and what are the 
corresponding eigenvalues fA? j(j + 1) and Am of J? and J3? 


as eigen- 


The “stretched” wave function a ~ with the maximum possible eigenvalues 


for Jg3 and Jp3 is an eigenfunction of J3 with m = jg + jp, so it must also be an 
eigenfunction of J* with j = jg + jp. It could not have a lower value of j with 
this value of m, and there are no wave functions with larger values of j since 
there are none with larger values of m. 
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Next, consider the wave functions Vey » and ame Both are eigenfunc- 
tions of J3 with m = jg + jp — 1. One linear combination of these must be the 
member of the 7 = jg + jp multiplet with m = jg + jp — 1; the other then has 
to be a member of some other multiplet, which by the same reasoning as before 
must have j = ja + jp — 1. 

We can continue in this way, with one multiplet for each j = jg + jp, j = 
Jat jo—-1, 7 = jat jp —2, and so on. After v steps, with m = jg+ jp —v, there 
are v-+1 choices of mg running up from jg —v to jg, and mp = m —mg, running 
down from jp to jp — v, with one new multiplet having j = ja+ jp — v for each 
increase in v. But this ends with v = 2 j, (taking jg > jp), for which mp, runs 
from jp to —jp. At the next step, with v = 2j, + 1, we would only get a new 
multiplet if 7, could run from jp down to — jp—1, which is impossible since we 
can only have |mp| < jp. So when jg > jp, the lowest value of j that is found 
in the addition of angular momenta jg and jp is j = ja + jp — 2p = ja — Jo- 
Of course, in the same way, if jp > ja, the lowest value of j is jp — ja. So in 
this way, we construct one multiplet for each 7 in the range 

j=jat jo f=jatio—l, Ja jativ—2% 5 f= liam dol. 

(5.4.30) 
This is the general rule for adding angular momenta. 

The linear combination of wave functions yee ie 

and m is conventionally written as 


with a definite value for j 


+Ja +Jb 


v= > DD CagG. mime mye. (5.4.31) 
Ma=— ja Mp=— jb 
where the Cj,, j,(j, ™; Mma, mp) are real numerical coefficients known as 
Clebsch—Gordan coefficients. Because J3 = Jg3 + Jp3, the Clebsch—Gordan 
coefficients are non-zero only form = mg + mp. 

In the appendix to this section we shall work this out in a simple case, the 
combination in hydrogen of the electron’s integer orbital angular momentum 
£ and its spin angular momentum with s = 1/2. We then provide a table of 
Clebsch—Gordan coefficients for various low values of jg and jp. 


Fine Structure and Space Inversion 


Let us apply what we have learned to alkali metals and hydrogen. In both 
cases the observed spectrum arises from transitions involving a single “valence” 
electron moving in an essentially spherically symmetric potential — the potential 
of the nucleus for hydrogen or the potential of the nucleus and tightly bound 
inner electrons for alkali metals. The total angular momentum J is then the 
sum of the valence electron’s orbital angular momentum L and that electron’s 
spin S. Since J commutes with the Hamiltonian, we can take the wave functions 
of definite energy also to have a definite value h? j(j + 1) of J* and a value 
hm of J3, with m running from —j to +/. Now, as we have just seen, these 
wave functions would in general be linear combinations of wave functions with 
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j =€4+1/2and j = €— 1/2 (where £ is defined by L? = f7€(€ + 1)), soa 
wave function of definite energy and j would in general be a linear combination 
of wave functions with both = j — 1/2 and@= j + 1/2. 

But in fact these states can be chosen to have definite values of £, because 
there is another conserved quantity. We can define a space reflection operator I 
by the condition that, for any wave function w, 


[TIW](x1, 01; X2,003...) = W(—X1,013 —x2,023...). (5.4.32) 


This is not a rotation. By a rotation of 180° around the z-axis we can change 
the signs of x and y but there is no rotation that changes the signs of all three 
components of a 3-vector. It is easy to see that the operator defined in this way 
has the properties 


=i, Pen (5.4.33) 
and (now considering just a single particle) 
OXM1=-X, NPH=-P. (5.4.34) 
So, we also have 
MLM=+L. (5.4.35) 


The defining condition (5.4.6) for the total angular momentum operator J, 
le-J,V] =—ihexV, 
is also satisfied by IIJII as long as IVI = +V, so 
NJ =+J (5.4.36) 
and then also 
NSM=-+S. (5.4.37) 
The operator IT commutes with the Hamiltonian: 
NA = AN (5.4.38) 


at least for Hamiltonians of the form encountered in atomic and molecular 
physics, even if we include spin-orbit coupling terms, proportional to S - L. 
It follows that we can choose the states of definite energy so that their wave 
functions are also eigenfunctions of IT: 


Ty=ary. 


Because I? = 1 the eigenvalue zr, known as the parity of the state, can only be 
+1 or —1. Indeed, given a wave function y for which Hw = Ew that is not an 
eigenfunction of II, we can always write it as a superposition Ww = w+ + w— 
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where Wi = (1 + II) wW/2. Since II commutes with H these satisfy Hw = 
Ewe, and since I? = 1 they satisfy Ww+ = +W+. 

AS we saw in Section 5.2 for general spherically symmetric potentials, a one- 
particle state with L? = h7€(€ + 1) has wave function (x) proportional to a 
homogeneous polynomial of order £ in the coordinate x, on which the operator 
I] gives a factor (—1)*, so the states with definite energy can be taken to have £ 
either even or odd. For 7 — 1/2 even or odd we have j + 1/2 respectively odd 
or even, and hence the states with definite energy and j can be taken to have a 
definite @, either j — 1/2 or j + 1/2. These states are therefore labeled 


Isa, 28172; 2pip, 2p3ja; 38ip, 3pip, Spr, Ida, 3d5j2y «+. 


where again the letters s, p,d, etc, stand for € = 0,1,2, etc.; the integer in 
front of the letter is the principal quantum number, defined so that the number 
of nodes of the wave function is n — £ — 1 (and therefore € < n — 1); and 
now the subscript gives the value of j. The energy depends on j as well as 
on n and (except for hydrogen) on £, with the j dependence arising both from 
relativistic corrections and the magnetic coupling of the electron’s spin with 
the orbital motion, but this dependence is rather weak and just gives rise to the 
fine structure of the energy levels. The difference in the energies of the 3p1/2 
and 33/2 states of sodium splits the wavelengths of the 3p1/2 — 31/2 and 
3 p3/2 — 351/2 transitions by just 1.02 parts per thousand, while the energies of 
the 21/2 and 2p3/2 states of hydrogen differ by only 4.44 parts per million. 

The hydrogen fine structure was first calculated in 1928 by Dirac in a rel- 
ativistic version of wave mechanics.!° The relativistic and spin effects that he 
calculated left the 21/2 state with the same energy as the 251 /2 state. Physi- 
cists including Hans Kramers (1894-1952) and Victor Weisskopf (1908-2002) 
realized in the 1930s that quantum electrodynamic effects such as the emission 
and reabsorption of photons by the orbiting electron would split the energies 
of the 2p1/2 and 251/2 states, but the calculation proved difficult. This splitting 
was first measured after the war by Willis Lamb (1903-2008) and R. C. Rether- 
ford,!! and is known as the Lamb shift. It is very small, 4.3515 x 10~° eV, 
about a tenth of the small fine-structure splitting between the 21/2 and 2/p3/2 
states. The successful calculation of the Lamb shift in 1949 by Norman Kroll 
(1922-2004) and Lamb!” and by J. B. French and Weisskopf!? marked the 
beginning of the modern understanding of quantum electrodynamics. 

Any particle at rest, whether elementary or not, will have what is called 
an intrinsic parity z, that depends only on the type n of the particle. If 


10 P. A. M. Dirac, Proc. Roy. Soc. A 117, 619 (1928). 

11 W.E. Lamb, Jr. and R. C. Retherford, Phys. Rev. 72, 241 (1947). 
12 NM. Kroll and W. E. Lamb, Phys. Rev. 75, 388 (1949). 

13 J. B. French and V. F. Weisskopf, Phys. Rev. 75, 1240 (1949). 
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the particle is in a state with orbital angular momentum , the parity of 
its state is (—1)z,. In our discussion above we have implicitly taken the 
electron to have positive intrinsic parity. This is a matter of definition; if the 
electron had negative intrinsic parity we could redefine the parity operator as 
Tl’ = exp(iz Q/e)TI, where Q is the operator for total electric charge. The 
one-electron state is an eigenstate of Q with eigenvalue —e, so it is an 
eigenstate of exp(izQ/e) with eigenvalue —1; if it were an eigenstate of 
TI with eigenvalue —1 it would be an eigenstate of II’ with eigenvalue +1. 
Since Q as well as II] commutes with the Hamiltonian, so does T]’ and 
it can be called the operator of space inversion just as well as I]. In the 
same way, because of the conservation of another quantity known as baryon 
number (described in Section 6.2) we can define the parity of the proton 
as +1. But the intrinsic parities of most particles have to be determined 
experimentally. 


Hyperfine Structure 


We must not forget the atomic nucleus, for if it has spin this produces a 
magnetic field felt by orbiting electrons. This effect is most important for the 
s-wave electrons that are not prevented from getting close to the nucleus by 
the centrifugal barrier that is present for € 4 0. In hydrogen the spin 1/2 of 
the nucleus combines with the spin 1/2 of the electron in its £ = 0 ground state 
to split the energy of the ground state into components with total spin s = 0 and 
s = 1, separated in energy by 5.9 x 10~° eV. The transition between these states 
produces the famous 21-cm absorption and emission spectral lines, discussed 
in Section 3.5. 


Appendix: Clebsch—Gordan Coefficients 


First, as an example of some intrinsic importance, let us work out how to form 
hydrogen wave functions with definite total angular momentum from wave 
functions with definite 3-components of spin and orbital angular momentum. 
Consider the “stretched” hydrogen wave function in which L3 and $3 are both 
as large as possible, having eigenvalues +fi¢ and +/i/2, respectively. In general 
we Shall label hydrogen wave functions with orbital angular momentum ¢ and 
spin 1/2 and definite values Am and ho for L3 and $3 as yr," We so this stretched 


wave function is denoted aN 1/2 For this wave function, J3 = h(€ + 1/2) is 


also as large as possible. This i is therefore a wave function with 7 = + 1/2 
(where as usual 7 is defined so that the eigenvalue of J is hj JU + 1)). This 
wave function could not have a larger j because then there would be states with 
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J3 > h(€+ 1/2), and it could not have a smaller j because then J3 could not be 
as large as fi(¢€ + 1/2). In general we shall label hydrogen wave functions with 
orbital angular momentum ¢@ and spin 1/2 and definite values A? j(j + 1) and 
hM for J? and J3 as We 1/2, i So we have 


€,+1/2 £+1/2 
We. 12 >> We. 1/2, €+1/2° (5.4.39) 
So far, this is pretty trivial, apparently not worth the elaborate notation. But 


now consider the wave functions with J3 = fi(¢ — 1/2). There are two of these, 
é-1, +1/2 


one of them, Wo. 1/2 , with L3 = A(é€ — 1) and $3 = +A /2 and the other, 
ve ci a with L3 = fi and $3 = —h/2. One linear combination of these two 


can be obtained by letting the lowering operator J; — iJz act on the stretched 
wave function. This is part of the same angular momentum multiplet as the 


stretched wave function We ip. e+1/2° with the same eigenvalue for y. SO iS 


labeled Ub. e+1/2" According to Eq. (5.4.27), if properly normalized this 


wave function is given by 


é-1/2 e+1/2 
V2E + We ti, e412 = 1 — We 1/2, e412 
. , +1/2 
= (Ly ~iL2 + $1) -iSaywe fy! 
Orbital and spin angular momenta obey the same commutation relations as total 
angular momentum, so their lowering operators act the same way as given in 
Eq. (5.4.27) for J, — iJa: 


: £,+1/2 imap, €-1, +1/2 ; £,+1/2 £,—-1/2 
(i= tL2) Wy Aes = 2tyy 1/2 : , i= 1S2) Wp ie _— We a > 
and therefore 
€-1/2 é-1, +1/2 £,—-1/2 
VEIN hip 2b Pein (5.4.40) 


Since there are two independent wave functions with J3 = f(¢ — 1/2), there 
must be another linear combination that is part of an angular momentum multi- 
plet with no higher value of J3 than A(£ — 1/2), so this multiplet has 7 = € — 1, 
and in our notation this linear combination is Vin. e—1/2" Since it has a dif- 
ferent value of j, this linear combination can be calculated by requiring it to 
be normalized and orthogonal to the one we found by acting with J; — iJ2 on 
the stretched wave function with L3 = fA€ and $3 = +f/2. That is (with a 
conventional choice of overall phase): 


£—1/2 €-1,+1/2 £,—1/2 
J+ WE ap = Wea + Vee” (5.4.41) 


By continued operation of the lowering operator on the wave functions (5.4.40) 
and (5.4.41), we fill out two complete multiplets, one with 7 = €+ 1/2 and one 
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with j = € — 1/2. These results can be summarized as values for the Clebsch— 
Gordan coefficients in Eq. (5.4.31): 


Ce iol + 1/2, £+1/2, €, +1/2)=1, 


2e 
C £+1/2, €—1/2, €-—1, +1/2) = ,/ ——_., 
2,1/2(€ + 1/ / + 1/2) TES 
1 
C £+1/2, €—1/2, @, —1/2) = ,/——_ , 
pipe 17 / (2) Eee 


1 
C £—1/2, €—1/2, €-—1, +1/2) = —,/ ——_. , 
2, 1/2¢ / fi +12) Vea 
Ce ob — 1/2, £- 1/2, & -1/2) =,/—* 
£,.1)2 e) 9 vs = +10 


All the Clebsch—Gordan coefficients can be calculated in this way, but life is too 
short. The best way to find Clebsch—Gordan coefficients is to look them up in 
a table. At the end of this section there is a table of these coefficients for small 
angular momenta. 

There is a symmetry property of the Clebsch—Gordan coefficients in the case 
of adding equal angular momenta that will be important for us when we come 
to diatomic molecules in the next section and to nuclear forces in Section 6.2. 
For ja = Jb, 


C jaja GM; ma mp) = (—1)4-744 Cj, 5, GMs mp ma). (5.4.42) 


This is trivial for the stretched configuration, where mg = mp = jq and 
j = 2jq. It is then also valid for all the Clebsch—Gordan coefficients with the 
same value of j, because the corresponding states are obtained by acting on the 
stretched configuration state with the symmetric lowering operator Jj — iJ2 = 
Jai + Joi — iJa2 — iJp2. The state with j = 2jg — landmg +mp = 2jq —1 
is a superposition of terms with mg = jg — 1,mp = jg and mg = ja, 
mp = ja—1, and, since it is orthogonal to the state with 7 = 2j, and mg+mp = 
2ja — 1, it must be antisymmetric in mg and mp. All the other states with 
j = 2jq — 1 are obtained by acting on this state with the symmetric lowering 
operator J; — iJ2, and so are also antisymmetric in mg and mp, in agreement 
with Eq. (5.4.42). Continuing, from the states with mg + mp = ja — 2 we 
can form one antisymmetric combination, which is needed in the multiplet with 
J = ja — 1, and two symmetric combinations, which can then only be in the 
multiplets with 7 = jg + jp and j = ja + jp — 2. And so on. 
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Table 5.1. The non-vanishing Clebsch—Gordan coefficients for the addition of 
angular momenta j, and j, with 3-components m, and mp to give angular 
momentum j with 3-component M, for several low values of jg and jp. 


Ja Jb J M Ma Mp C ja. jp(J Ms Mamp) 

1 1 1 1 

5 5 | +1 +5 +5 1 

5 5 1 0 +5 5 1//2 

1 1 1 1 

2 z | = 3 =5 ! 

1 1 1 1 

5 5 0 0 +5 5 +1/2 
1 3 3 1 

1 5 ; +3 +1 +5 1 

1 5 3 +5 +1 $5 JI 
1 3 1 1 

1 5 +5 0 +5 273 

1 5 5 +5 ati — +,/2/3 
1 1 1 1 

1 5 5 +5 0 +5 sere We] 

1 1 2 +2 +1 +1 1 

1 1 2 +1 +1 0 1//2 

1 1 2 +1 0 +1 1//2 

1 1 2 0 +1 =| 1/6 

1 1 2 0 0 eve 

1 1 1 +1 +1 0 +1/./2 

1 1 1 +1 0 +1 = /a/2 

1 1 0 0 +1 =e 1/V3 

1 1 0 0 0 0 —1//3 


5.5 Bosons and Fermions 


Identical Particles 


Aside from their momenta and helicities, every photon in the universe is 
identical to every other photon. The reason is that all photons are quanta of the 
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same field, the electromagnetic field. In the same way, aside from their momenta 
(or positions) and spin components, according to the modern understandings 
outlined in Chapter 7, every electron in the universe is identical to every other 
electron because they are all quanta of a single field, known as the electron 
field. The same is true of every other species of elementary particle — quarks, 
neutrinos, and so on — each is the quantum of a particular field. Indeed, our best 
current definition of an elementary particle is that it is the quantum of one of the 
fields of which the world is composed. But the same indistinguishability is true 
of composite systems in any one specific state. Two protons are indistinguish- 
able because they are each composed of three quarks of the same two different 
types in the same bound state, and two hydrogen atoms in the same atomic 
state are indistinguishable because they are each composed of an electron 
and a proton. 

In writing a wave function for identical particles as W(X1,01; X2,02; ...), 
it is incorrect to say that for this wave function the first particle has position 
x; and spin 3-component 0 while the second particle has position x2 and spin 
3-component o2, and so on. Instead we should say that there is a particle 
with position x; and spin 3-component o; and another particle with posi- 
tion x2 and spin 3-component o2, and so on. Thus for identical particles, 
W(X1,01; X2,02; ...) and W(X2,00; X1,01; ...) represent the same state. 
Two wave functions that represent the same state can only differ by a constant 
factor, so 


Wr (Ko,02; X1,01; ...) =Aw(K1, 015 X2,02} ...) 


for some constant 2. Integrals don’t depend on how the variables of integration 
are labeled, so it follows that jf Iw? = |Al? FA wl, and therefore 4 can only 
be a phase factor, with |A| = 1. Further, the constant 4 cannot depend on 
position or spin 3-components without violating various symmetry principles, 
such as Galilean or Einsteinian relativity, rotational invariance, and translation 
invariance. so we can therefore repeat the same relation with identical particles 
1 and 2 interchanged on both sides, but with the same A, and write 


: : ; : 2 3 : 
W(X1, 01; X2,02; ...) =AwW(Ko, 02; X1,01; ...) =A“ W(K1, 01; X2,02; ...) 


and therefore 47 = 1. We have only two possibilities, 4 = +1. Our usual 
assumptions regarding locality would not allow the choice of signs to depend 
on whatever other particles are described by the wave function if these particles 
were very far away from particles 1 and 2, and the continuity of wave functions 
would not allow this sign to jump between +1 and —1 as these other particles 
come close. We conclude that the value of 4 encountered when we exchange 
a pair of indistinguishable particles can depend only on the species of these 
particles. 

Particles for which 24 = 1, so that the wave function is symmetric in the 
labels of these particles, are known as bosons. They are named after Satyendra 
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Nath Bose (1894-1974), who first described multi-photon states, imposing this 
symmetry condition.!* Einstein had Bose’s paper translated into German and 
published, and then applied these ideas to material particles. !> 

Particles for which 4 = —1, so that the wave function is antisymmetric in 
the labels of these particles, are known as fermions, named after Enrico Fermi 
(1901-1954). Fermi!® and Dirac!’ at about the same time described multi- 
electron states, imposing this antisymmetry condition. 

It is another consequence of the relativistic quantum theory of fields that 
elementary particles (the quanta of fields) are bosons or fermions according 
to whether their spin is an integer or half an odd integer.!8 The reason for this is 
outlined in Section 7.4, but a complete proof is beyond the scope of this book. 
It is easy, though, to see that if this correlation with spin is valid for some set 
of elementary particles then it is valid for any composites of these particles. 
If we interchange two identical composite particles then we are interchanging 
all their constituents, so the interchange gives a minus sign multiplying the wave 
function if each of the composites contains an odd number of fermions and a 
plus sign otherwise, no matter how many bosons it contains. But, according 
to the rules for adding angular momenta described in the previous section, a 
composite has a half odd integer spin if it contains an odd number of half odd 
integer spin particles, and integer spin otherwise, no matter how many integer 
spin particles it contains. So a composite with half odd integer spin contains 
an odd number of fermions, and is therefore a fermion, while if it has integer 
spin it contains an even number of fermions (perhaps zero) and is therefore a 
boson. No other correlation of boson/fermion character with spin would have 
this consistency. 

So electrons, quarks, protons, and neutrons, which have spin 1/2, are 
fermions. The spin of massless particles like photons requires special consider- 
ation, but as noted in Section 7.5 the components of their angular momentum in 
the direction of travel can only be EA, corresponding to left and right circular 
polarization, and they are bosons. Indeed, as already mentioned, Bose’s original 
introduction of symmetric states had to do with photons. Hydrogen and helium 
atoms are bosons, while Li atoms (with three protons, three neutrons, and three 
electrons) are fermions. 


Statistics 


The distinction between bosons and fermions has a profound impact on the 
properties of gases in thermal equilibrium. As we did for photons in Section 3.2, 


14 5 N. Bose, Z. Phys. 26, 178 (1924). 

I5 A. Einstein, Sitz. Preuss. Akad. Wiss. 1, 3 (1926). 

16 E. Fermi, Rend. Lincei 3, 145 (1926). 

!7 P A.M. Dirac, Proc. Roy. Soc. A 112, 661 (1926). 

18 This was first stated as a general rule by M. Fierz, Helv. Phys. Acta 12, 3 (1939) and W. Pauli, Phys. Rev. 
58, 716 (1940). 
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to calculate the densities of particles with various momenta we can imagine a 
gas of any identical particles in a cube of volume L*. Since the particles in a gas 
are essentially free particles, their momenta are quantized like photon momenta, 
with p = 27nf/L, where nis a 3-vector with integer components. As we saw in 
our discussion of momentum space in Section 5.3, the number of these allowed 
momentum values in a momentum-space volume d? p is 


dN = d?n = (L/2nh)* x d*p. (5.5.1) 


The number of particles per volume with momentum in a volume d? p of mo- 
mentum space around one of these allowed momentum values is then 


dN (p) = gdN Np/L? = g(Q2ah)*d° p Np, (5.5.2) 


where Np is the mean number of particles in these states with momentum p 
and any given spin 3-component or helicity. (In Eq. (5.5.2), we include a factor 
g equal to the number of spin or helicity states for each allowed momentum 
value. For massive particles of spin s, we have g = 2s + 1 states, which are 
characterized by different values of 53, while for photons g = 2.) The mean 
number No in a gas with temperature T and chemical potential jz is given by 
the grand canonical ensemble discussed in Section 2.4: 


w, — uN exp(-NLE(p) = 11/kT) 
Py exp(—NLE(p) — 11/7) 


the sums running over the allowed numbers of particles with momentum p. It is 
in these sums that there appears a distinction between bosons and fermions. 

For bosons N runs over all integers from zero to infinity, and we have 

7 1 
* exp(LE(p) — 4]/kT) ~ 1 

This is known as the case of Bose-Einstein statistics. The chemical potential 
je can be non-zero only if the total number of particles is conserved, so for 
photons 4. = O and the result of using Eq. (5.5.4) in Eq. (5.5.2) (with the 
number of polarization states g = 2) is equivalent to the Planck distribution 
(3.1.14). For material particles such as atoms whose number is conserved under 
ordinary conditions we can have yz > 0, and then at very low temperature Np 
is very sharply peaked at momenta for which the energy E(p) is close to yu. It 
is even possible to have a macroscopic number of particles with energy ju, a 
phenomenon known as Bose-Einstein condensation, first seen by Eric Cornell 
and Carl Wiemann and their collaborators in a gas of rubidium atoms in 1995.!9 
(There is also a sort of Bose-Einstein condensation in liquid helium, but it is not 
a good approximation to treat liquid helium as a gas.) 


(5.5.3) 


(5.5.4) 


19 MH. Anderson et al., Science 269, 198 (1995). 
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For fermions it is not possible to have more than one particle with a given 
momentum p (and a given spin 3-component), because the wave function for 
two such particles would be proportional to 


exp(7p - x1 /fi) exp(ip - x2/h), 

which is symmetric rather than antisymmetric in the two particles. Hence the 
sums in Eq. (5.5.3) run only over the values N = 0 and N = 1: 

7 1 

Np = . 

exp(LE(p) — ]/kT) +1 

This is known as the case of Fermi—Dirac statistics. For very low temperatures 
this takes the form 


(5.5.5) 


Np > ee a (5.5.6) 


0 E(p)>ept. 


This is used to derive a relation between the number densities and energy densi- 
ties in white dwarf stars, whose high density requires electrons to have energies 
much larger than chemical binding energies, though they are essentially at zero 
temperature. For white dwarfs of relatively low mass, jz is much less than 
m,c~ but much larger than chemical binding energies, so the number density 
of electrons is given by 


2 8p} 
Ne = dN(p) = oom | Anp* dp = ——*, 
= (2h)? Jo 3(2xh)3 


where pr is the Fermi momentum defined by E(pr) = w. (In practice, we 
use a known or assumed value of 7 to calculate pr.) The corresponding kinetic 
energy density is 


p 870 Pr 


2 PF 
ge E@aN@) =—"— 1 ane a = 
Tc ee eel ee Ome 10merhy? 


= (82)~7/3 (27h)? (Bne)/7/10me 


As shown in Eq. (1.1.3), the pressure of any non-relativistic monatomic gas is 
p = 2€/3, so this gives an equation of state for low-mass white dwarfs: 


p= Kp?” , K = (2/3)(8)77 (2ah)*GZ/Am,)>/?/10me , 


where p = n.Am,/Z is the mass density. 


The Hartree Approximation 


In multi-electron atoms it is often a good approximation to treat each electron 
as moving in a spherically symmetric (but not Coulomb!) effective potential 
arising from the atomic nucleus and from all the other electrons. This is known 
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as the Hartree approximation, introduced by Douglas Hartree (1897-1958) in 
1928.79 Each electron occupies some one-particle state of definite energy in this 
effective potential with corresponding wave functions yj (x,o), W2(x,o), etc. 
If electrons were distinguishable, the wave function for an atomic state with N 
electrons — electron | in state 1, electron 2 in state 2, etc. — would be the product 


W1 (1, 01) W2(X2, 02) --- Wn (KN-ON) « 


But, because electrons are indistinguishable fermions, this must be antisym- 
metrized. The true wave function (up to a normalization constant) is 


y= S > dpWi (xpi, opi) 2(xP2, oP?) > Wn (XPN.OPN) » (5.5.7) 
P 


the sum running over all permutations P of 1,2,...,N into Pl, P2,..., PN, 
with dp = +1 or dp = —1 for P an even or odd permutation, respectively. 
For instance, for a two-electron state there are two permutations P, the identity 
1— 1, 2 > 2 with dp = +1, and the interchange 1 <= 2, with dp = —1, so 


Ww = Wik, 01) W2(%2, 02) — Wi (%2, 02) W2(K1, 01) . 


In general, the wave function (5.5.7) can be written as a determinant, known as 


a Slater determinant:2! 
Wi(%1,01) Wi(%2,02) --- Wilky,on) 
y =| P2001) ¥2(%2,02) +++ Walkn, on) (5.5.8) 


W3(X1,01) W3(X2,02) ++: W3(ky,on) 


The Pauli Exclusion Principle 


None of the one-particle states occupied by electrons in the Hartree approxima- 
tion can be the same, for if they were then two rows of the Slater determinant 
would be identical, and the wave function would vanish. This principle was first 
stated by Pauli,?” on the basis of efforts to understand the periodic table of the 
elements, before it became understood that multi-electron wave functions have 
to be antisymmetric. The number of values of L3 for a given ¢ is 2 + 1, so 
Pauli at first thought that not more than 2¢ + 1 electrons can have the same n 
and £, but as we shall see, to get the chemistry right it is necessary to assume 
that the maximum number of electrons with a given n and ¢ is 2(2¢ + 1). For 
this reason Pauli introduced a new quantum number that takes just two values, 
which as discussed in the previous section were identified by Goudsmit and 
Uhlenbeck as the two values $3 = +h/2 of the 3-component of electron spin. 


20 D. H. Hartree, Proc. Camb. Phil. Soc. 24, 111 (1928). 
21 J.C. Slater, Phys. Rev. 34, 1293 (1929). 
22 W. Pauli, Z. Physik 31, 763 (1925). 
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Pauli reasoned that as we increase the number Z of electrons in atoms each 
added electron must occupy the one-particle state of next largest energy. This is 
why electrons in atoms do not all fall into the two 1s states of lowest energy, so 
that atoms with Z > 2 do not all behave chemically just like helium. 


The Periodic Table 


The Pauli exclusion principle provides an explanation for the periodic table 
of elements, first described in purely chemical terms by Dmitri Ivanovich 
Mendeleev (1834-1907) in 1869, long before atomic structure was understood. 
Of course, Mendeleev knew nothing about electrons but he knew the values 
of atomic weights and could list the elements in order of increasing atomic 
weight. As we saw in Section 5.2, in the twentieth century it became clear that 
the atomic number, defined as the place of an atom in this list, is the same 
(with a few exceptions) as the charge Z of the atomic nucleus in units of e and 
is hence equal to the number of electrons in the atom, on which the chemical 
properties of elements chiefly depend. 

Detailed calculations show that the one-electron states are filled (with spo- 
radic exceptions) in the order 


1s, 

2s, 2p, 

3s, 3p, 

4s, 3d, 4p, 

5s, 4d, 5p, 

6s, 4f, 5d, Op, 

7s, 5 f, 7p, ..- (5.5.9) 


(We are here ignoring the small fine-structure splitting in the energies of these 
states, and so are leaving out subscripts giving the values of j.) For a given @, 
increasing n increases the number of nodes of the wave function, so that the 
wave function oscillates more with r, which increases the kinetic energy. This 
is the main reason why electron energies increase going down the list. But, for 
a given n, the increase in centrifugal force with increasing ¢ decreases the wave 
function at small r where the charge interior to r is largest, which decreases the 
effective absolute value of the negative potential energy, increasing the state’s 
total energy. Hence, although the one-electron states listed above on the same 
line have approximately equal energy, the energies increase somewhat from 
left to right. In the case of 3d, 4d, 4f, 5d, and 5f states and many states 
with n > 6, the dependence of the energy on ¢ turns out to overcome its 
dependence on n. 
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Taking spin into account, the total number of states for the energy levels listed 
on each line of Eq. (5.5.9) are 2,2 +6 = 8,2+6 = 8,24+10+6 = 18, 
2+10+6 = 18,2+ 14+ 10+ 6 = 32, and so on. These are substantially 
the same periodicities that had been discovered chemically by Mendeleev. For 
instance, electrons that fill up any one of the lines of the table are said to form 
a closed shell. It is energetically unfavorable for atoms whose electrons just fill 
closed shells to gain or lose electrons, so these atoms are chemically inert. They 
are the noble gases: there is helium with Z = 2, neon with Z = 2+ 8 = 10, 
argon with Z = 10+ 8 = 18, krypton with Z = 18 + 18 = 36, xenon with 
Z = 36+ 18 = 54, and radon with Z = 54+ 32 = 86. Elements with one elec- 
tron outside closed shells find it easy to lose that electron, which can move freely 
through the crystal lattice carrying currents of electricity or of heat. These are 
the alkali metals: lithium with Z = 2+ 1 = 3, sodium with Z = 10+ 1 = 11, 
potassium with Z = 18+1 = 19, and so on. Elements with one electron missing 
from the highest energy closed shell react strongly in chemical reactions in 
which they can gain an electron. These are the halogens: there is fluorine with 
Z = 10—1 = 9, chlorine with Z = 18—1 = 17, bromine with Z = 36—1 = 35, 
and so on. 

More generally, if an atom has a few electrons outside closed shells, it has 
what chemists call a positive valence, equal to that number of extra electrons; 
if it has a few electrons less than needed to fill closed shells, then it has neg- 
ative valence, equal to that number of missing electrons. Thus alkali metals 
have valence +1; the so-called alkali earths beryllium, magnesium, calcium, 
etc. have valence +2; the halogens have valence —1; oxygen, sulfur, etc. have 
valence —2; and so on. The molecules of many simple chemical compounds 
(not all!) are held together by electrostatic attraction between ions of elements 
with positive and negative valence that have traded electrons. Since electrons 
are neither created nor destroyed in chemistry, in such molecules if electri- 
cally neutral the total valence must be zero. These include such compounds 
as salts composed of metal and halogen atoms, like sodium chloride, oxides 
like calcium oxide, etc. Hydrogen can act as if it has valence +1, as in water or 
ammonia, or valence —1, as in metal hydrides. 


Diatomic Molecules 


The rotational energy spectrum of molecules like Hz, No, Oo, etc. that are com- 
posed of two identical atoms is profoundly affected by the bosonic or fermionic 
nature of the atomic nuclei. The energy required to excite rotational states of 
molecules is less than the energy required to excite vibrational states by factors 
of order (me/Am})!/*, and less than the energy required to excite electronic 
states by even smaller factors, of order m./Amy, so the lowest energy states 
of molecules are rotational states in which the separations of atomic nuclei and 
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the state of atomic electrons can be regarded as fixed. In such states the wave 
function of a molecule consisting of two identical atoms is proportional to 


Cs (01, 02) V9" (A) £ cs (02, 01) Y"(—A), (5.5.10) 


where 7 is a unit vector in the direction from nucleus | to nucleus 2; Y t is 
the usual spherical harmonic, of the sort discussed in Section 5.2; cs (01, 02) 
is a spin wave function that depends on the total spin s of the two nuclei and 
their individual spins s; = s2, as well as on the spin 3-components oj and 
02, about which more later; and the sign is +1 or —1 if the nuclei are bosons or 
fermions, respectively. The energy of the rotational states with a given ¢ is given 
in quantum mechanics by replacing L? in the classical formula E = L?/2/ with 
he(€ + 1), so that 


Ae +1) 
> BE 
with almost no dependence on total spin. Here J is the moment of inertia of 
the molecule around a line perpendicular to 7 through the center of mass of the 
molecule. Now, Y;"(—n) = Ebge 7" (”). Also, the spin wave functions have 
the important symmetry property 


Ep (5.5.11) 


€s (02,01) = £(—1)*e; (01, 02) , (5.5.12) 


where the sign + is (—1)*°!; that is, +1 for adding two equal integer spins, 
and —1 for adding two equal half odd integer spins. (In terms of the Clebsch— 
Gordan coefficients described in the previous section, 


C5 (01,02) = Co1,51 (s 030102) , 


where o = o; +02. Equation (5.4.42) with jg = 51, J = Ss, mg = 01, Mp = 02, 
M =o gives 


—2, 
C515, (8 03 62.01) = (—1)* "Cg, 5, (8 03 01 02) , 


which is the same as Eq. (5.5.12).) Because of the spin—statistics connection, 
the + sign in Eq. (5.5.12) is the same as in Eq. (5.5.10). We see then that these 
+t signs cancel, and the only states in which the wave function does not vanish 
are therefore those in which 


(-)6 =(-1). (5.5.13) 


Either s and @ are even, in which case the molecule is distinguished by the 
prefix para, or both are odd, and the prefix is ortho. For instance, in H2 we have 
parahydrogen, with s = 0 and ¢ even, and orthohydrogen, with s = 1 and ¢ odd. 
The degeneracy of the states is then (2¢ + 1) for parahydrogen and 3(2¢ + 1) 
for orthohydrogen. 


174 5 Quantum Mechanics 


The forces acting on spins are so weak that radiative transitions do not 
change s and therefore can only change ¢ by an even number. The dominant 
transitions are those in which ¢ changes by two units, giving a radiated energy 


hi h 
Ep. — Ec = xl +2)(€+3)-£+1)] = ria +6]. (5.5.14) 


For para molecules the energies (5.5.14) are 3h7/T, Th? /T, 11n7/T, etc., while 
for ortho molecules they are 57/1, 9h7/1, 13h7/T, etc. Observing this pattern 
of energies, with proportions 3:7: 11:--- or5:9:13:.---, itis possible 
to judge which transitions are in para and which in ortho molecules, even if one 
does not know the moment of inertia /. 

The energy f*/2I is typically much less than kT, so the abundance of 
diatomic molecules in a state with given s and ¢ is simply proportional to the 
degeneracy (2s + 1)(2¢+ 1). (For instance, for hydrogen hi /21=kx 45K.) 
The observed transitions are typically between states with € >> 1, and the 
intensity of the radiation emitted is mostly a matter of the number of spin states 
for € and s even or odd, as follows. If the spin sj of each nucleus is an integer, so 
that they are bosons, then the allowed even values of s are 251, 25; — 2, ...,0, 
and the allowed odd values of s are 2s; — 1, 25; — 3, ..., 1. Hence in this case 
the total number of spin states for para and ortho molecules is 


S] 
#para =) 2Q2n) +1 = (81 + Qs +1), 
n=0 
S| 


Horho =) 2Qn —1)+1=512s1 +1), 


n=1 
and the ratio of the intensities of para and ortho transitions is 


1 
ae ee = (bosons) . (5.5.15) 
ortho S| 


On the other hand, if s; is half an odd integer, so that the nuclei are fermions, 
then the allowed even values of s are 2s; — 1,25; — 3,...,0 and the allowed 
odd values of s are 25,,2s, — 2,...,1. Hence for s; a half odd integer the total 
number of spin states for ortho and para molecules is the same as the number of 
spin states for para and ortho molecules in the case where s; is an integer and 
the ratio of the intensities of para and ortho molecules is the reciprocal of the 
ratio (5.5.15): 


ee (fermions) . (5.5.16) 
ortho s,;+1 
(For example, for hydrogen s; = 1/2, so the abundance of parahydrogen is 


about one-third that of orthohydrogen, and the total intensity of radiation emit- 
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ted or absorbed in transitions in parahydrogen is about one-third the ratio for or- 
thohydrogen.) Evidently one can tell whether nuclei are bosons or fermions just 
by observing whether radiation from the para or ortho transitions is stronger. In 
the next chapter we will see that observations of the diatomic nitrogen molecule 
presented a puzzle regarding the nature of the nitrogen nucleus that was only 
resolved with the discovery of the neutron. 

Clouds of interstellar diatomic molecules can cool to quite low temperatures 
by collisional excitation of rotational energy levels, after which the excitation 
energy is emitted as radiation that leaves the cloud. This is an important feature 
in the formation of stars by gravitational condensation of interstellar matter, 
which requires low temperatures to mitigate pressure forces that can prevent 
condensation. But for cooling, it is necessary that radiation should often be 
emitted before the molecule gives its excitation energy back to the cloud in 
another collision. 

This is an obstacle to cooling by diatomic molecules with identical atoms. As 
discussed in Section 7.5, the fastest radiative transitions in atoms and molecules 
generally are electric dipole transitions, in which there is a non-zero value for 
f Wanark Vinitial (where P is the momentum of the radiating particle). Since Pisa 
three-vector that changes sign under reflection of coordinates, this integral van- 
ishes unless certain selection rules are obeyed: when spin effects are neglected, 
the initial and final states must have opposite signs for the parity (—)* and must 
have values of @ that differ by no more than one unit. Neither selection rule 
is satisfied by the transitions in diatomic molecules with identical atoms, in 
which £ changes by two units. These are what in Section 7.5 are called electric 
quadrupole transitions, which are much slower than electric dipole transitions. 
Thus, although Hz is by far the most common molecule in interstellar space, it 
contributes little to the cooling of molecular clouds. 

On the other hand, in diatomic molecules with distinguishable atoms radiative 
transitions can occur rapidly as electric dipole transitions in which ¢ changes by 
one unit, and these molecules when excited by collisions often lose energy by 
radiation rather than in further collisions. Of the more abundant molecules of 
this sort, the most effective at cooling interstellar clouds is CO. This molecule 
has a large moment of inertia, with h?/Ik ~ 5.5 K, so it can cool clouds to 
very low temperatures. The hydroxyl molecule OH is more abundant but has a 
smaller moment of inertia and hence larger excitation energies, so it cannot cool 
clouds to temperatures as low as can CO. 


5.6 Scattering 


Much of atomic, nuclear, and elementary particle physics is based on data 
gained from the scattering of particles in collisions with other particles. In the 
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main body of this section we will consider scattering processes only in the case 
that is simplest kinematically: the scattering of a particle by a much heavier 
particle, such as the scattering of alpha particles by nuclei of various metals in 
the 1911 experiment that led Rutherford to the discovery of the atomic nucleus. 
In this case we can approximate the effect of the heavy target particle by taking 
it to be at rest at the origin of coordinates, and representing its interaction with 
the scattered particle as a fixed external potential V(x) that depends only on 
the coordinate of the scattered particle. Not only is this a good approximation 
for some scattering processes of historical importance — as we shall see, it was 
the study of scattering using this approximation that led to the probabilistic 
interpretation of quantum mechanics. An appendix to this section considers the 
calculation of more general scattering and decay processes with any number of 
particles of any type in the initial and final states. 


Scattering Wave Function 


We again use the time-independent Schrédinger equation 


h2 
—7 Vi) + V(x)W(x) = Ew(x), (5.6.1) 


where V(x) — O for |x| — oo. But now instead of treating bound states with 
E <0, we here consider a particle with E > 0 that comes into the range of the 
potential from an infinite distance and then recedes to infinity. We define a wave 
number k > 0 by 
nk? 
E=—.,, (5.6.2) 


2m 


and we rewrite the Schrddinger equation as 


2 2 2m 
(V~ + k*)w(x) = qe eve) : (5.6.3) 
When x is far outside the range of the potential there is an asymptotic solution of 
Eq. (5.6.3) that approaches a plane wave exp(ikx3)/ (27 h)3/2 (conveniently nor- 
malized), which represents a particle coming in from infinity along the 3-axis. 
We seek a solution of Eq. (5.6.3) with this asymptotic form: 


eikx3 
(nhs? 


for |x| — oo. To find such a solution, we replace Eq. (5.6.3) with an integral 
equation that incorporates the boundary condition (5.6.4): 


W(x) > (5.6.4) 


ikx3 


2m 307 / / / 
++ a fa x’ Ga(x — x) V(x) W(x), (5.6.5) 
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where G;(x — x’) is a Green’s function (named after the nineteenth century 
mathematician George Green (1793-—1841)) satisfying the conditions 

(V2 +k) G(x — x’) = P(x x’), (5.6.6) 
Gi(x — x’) > 0 for |x —x’| > 00. (5.6.7) 


Here 5° (x — x’) is the Dirac delta function, defined by the condition 
/ dx’ 3 (x —x’) f(x’) = f(x) (5.6.8) 


for any sufficiently smooth function f(x). 


Representations of the Delta Function 


Of course, there is no function for which Eq. (5.6.8) is literally satisfied but, 
by taking 6°(x — x’) to be very large when x is very close to x’ and very small 
otherwise, we can come arbitrarily near to satisfying Eq. (5.6.8). For example, 
we can take 


1 
S(x—-x)= Gas exp (— (x- x’)?/d*) , 


where d is some very small length. It is more convenient here to use another 
well-known representation of the delta function: 


1 
S(x—x) = a 3 f a exp (q-(x—x’)). (5.6.9) 


With this representation, Eq. (5.6.8) is the fundamental theorem of Fourier 
analysis: if 


eq = fa? xe F(X!) 


then 


1 . 1 i of 
f(x) = ae [ea eI 9(q) _ aa [ea ae a f (x) 


which with an interchange of the order of integration (discarding mathematical 


rigor) is the same as Eq. (5.6.8). If the wave function @p(x) for a free particle of 
momentum p is defined so that 


(p(X) = (20h) */? exp(ip - x/h) 


then Eq. (5.6.9) gives these wave functions a simple delta-function normalization 


: dx gis (x)pp(X) = 8°(p' — p), 
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which is why we inserted a denominator (27 h)3/ in Eq. (5.6.4). As we shall 
see, we can derive valid results by manipulating 53 (x — x’) as if it were a well- 
defined function. 


Calculation of the Green’s Function 


Using Eq. (5.6.9), we can easily write a solution of the differential equation 
(5.6.6): 


exp (iq - (x — x’) 
[ea p(i ) 


G(x —x’) = 
K(x) k* —q? +ie 


5.6.10 
Ons ( ) 
where € is a positive infinitesimal that makes the integral well-defined despite 
the singularity at |q| = k. (The reason for taking € positive will be made clear 
below.) 

The integral over the directions of q in Eq. (5.6.10) gives 


Gil " An | > , sing|x—x’| 1 
Xx—-X)= 
Qn Jo 1°! gx_—x| e—@q tie 
An 1 e exp (ig|x — x’|) 
. i 5.6.11 
(27)3 2i|x — x'| [4 I 2 gt + ie aa 


We can evaluate the integral over g by closing the contour with a very large 
semicircle in the upper half of the complex plane, on which the integrand is 
exponentially small. Since € is infinitesimal, we can write 


1 1 1 
Rea @tie k+tie/2k—q k+ie/2k+q 


and evaluate the integral as 2iz times the residue of the pole inside the contour, 
atg =k + ie/2k, and then take €« — 0: 


Ya 
Gy(x — x’) = ———— _ exp (ik|x — x’]) , (5.6.12) 
An |x — x’| 


so that Gx, (x — x’) satisfies the boundary condition (5.6.7). 


The Scattering Amplitude 
Using Eq. (5.6.12) in Eq. (5.6.5) gives 


= ee i m @ / 1 ik / V / / 
V0) = aa 5 7 f ax Sy exP (ikl — x) VK IWC). 
(5.6.13) 


In the limit when |x| is much larger than the values of x’ at which V(x’) is 
appreciable, we can use the approximation 
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Ix —x’| > r,/1 —2x-x//r2 >r—X-x’ 


where r = |x| and x = x/r. This gives 
ikr 


(x) je 4 — rc] (5.6.14) 


1 
> ———— 
(27 h)3/2 
where f (x) is the scattering amplitude 


: Can mw fa, a en ee 
f(xX)= ea fe x exp(—ikx -x)V(x) W(x’). (5.6.15) 
Qn h 


Probabilistic Interpretation 


At a distance r from the scattering center that is not only large compared with 
the range of the potential but also much greater than the wavelength 27/k, the 
second term in Eq. (5.6.14) at any given direction x behaves like a plane wave 
moving outward with wave vector kx. This is a familiar behavior for all sorts 
of waves. A plane ocean wave encountering an obstacle in the water will break 
up and spread out in all directions, just as in Eq. (5.6.14). But a particle like 
an alpha particle in Rutherford’s laboratory encountering a target like a gold 
nucleus does not break up. It hangs together, and is scattered in some definite 
direction, though not a direction that can be predicted in advance. This showed 
that w(x) or |w (x)|? cannot represent how much of the scattered particle is at x. 
It was this remark about scattering that led Max Born (1882-1970) in 1926 to 
propose? that if yy is suitably normalized then |, (x)|? is the probability density 
at x — that is, |w (x)|?d>x is the probability that the particle is in a small volume 
d>x around x. 

For a proper treatment of what happens in scattering it is necessary to con- 
sider a wave function that at early times is a packet of free-particle waves, as 
in Eq. (5.1.3), and use the time-dependent Schrédinger equation to follow the 
subsequent scattering. This is the approach followed in the appendix to this 
section. But, with a moderate amount of hand-waving, we can derive the most 
important results more simply, just using Eq. (5.6.14). 

Suppose that at some early time before the scattering the incoming particle is 
in a thin disk of area A and thickness L at right angles to the path of the particle. 
In order for ||? to serve as a probability density, we have to arrange that e/4*3 
comes with a factor 1/ VAL instead of 1 / (20 h)3/ 2\ so that the integral of Iw |? 
over the disk at early times is unity. The scattering wave function (5.6.14) will 
then also be multiplied by (27h)3/2/./AL. At a late time ¢ after the collisions 
a scattered particle will be in a thin disk of the same thickness L at a distance 


23 M. Born, Z. Phys. 38, 803 (1926). 
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r = vt from the scattering center (where v = ,/2E'/m). The probability density 
at position rx will be 
a2 _ FP /r? 

VO = 
and the probability dP that the particle will be in a small solid angle dQ around 
X is this probability density times the volume of a disk of thickness L and area 
a: 
_ 1A @)P/r? 

AL 


dP x Lr?dQ =|f (XP dQ/A. (5.6.16) 


This is the same probability as if the particle by chance had to hit a tiny target 
areado = |f (£)|7d Q somewhere within the larger area A in order to be 
scattered into a solid angle dQ around x. The ratio of the target area do to 
the solid angle dQ is then 


do neil 
79 = FOP = 
and is known as the differential cross section. Much of modern theoretical and 
experimental physics consists of the calculation and measurement of differential 
cross sections. 

Now we can see why it was necessary to take € positive-definite in Eq. (5.6.10) 
for the Green’s function. With € negative-definite the integral (5.6.11) over qg 
would still be well-defined and we could still evaluate it by closing the contour 
of integration with a large semicircle in the upper half complex plane of gq, 
on which the factor exp(iq|x — x’|) is exponentially small. Only now, with ¢€ 
negative, the pole in the integrand in the upper half of the complex plane would 
be at gq = —k — ie/2k, and in the asymptotic form of the wave function the 
factor exp(ikr) would be replaced with exp(—ikr). Instead of a wave going 
out in all directions to large distances, as in Eq. (5.6.14), this would represent a 
wave coming into the potential along all directions from a great distance, which 
is not what happens in any scattering process. 


2am? 307 +7 A / / / 3 
——-. fa x’ exp(—ikx -x )V(x) W(x) (5.6.17) 


The Born Approximation 


Equation (5.6.5) is of course not in itself a solution of the differential equation 
(5.6.3), because w appears on the right-hand side of the equation, as well as on 
the left. But it does suggest a solution that is a good approximation if 2m|V|/A7 
is everywhere much less than k?. In this case we can approximate y on the 
right-hand side of Eq. (5.6.5) with the term of zeroth order in V, that is, with 
e!*3 /(Qnh)3/?: 


W(x) & efkxs 4 ms i d?x'Gy(x — xyvenes| . (5.6.18) 


(27 h)3/2 Re 
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This is known as the Born approximation. Repeating our earlier calculation of 
the scattering amplitude, or just jumping back to Eq. (5.6.15) and replacing 
w(x’) with eikx; /(2sh)?/?, gives the corresponding approximation for the scat- 
tering amplitude: 

m 
2m h? 
This formula becomes particularly simple in the frequently encountered case 


in which the potential is spherically symmetric. We can write Eq. (5.6.19) in 
this case as 


fGy== 


i dx! exp(—ikk - x’)V(x’) ef . (5.6.19) 


i@je= 


; fax’ expiK -x) V(x) , 
1 


where 
K=k(Z—<x). 


Here Z is a unit vector in the 3-direction, the direction of the original particle 
velocity. The integral over the direction of x’ is then easy: 


2m 


[4 pe ‘) [ov ‘) : (K f rd 
mr rj)=- sa r)sin(Kkr’)r'dr , 
0 Kr’ Kh? Jo 

(5.6.20) 


fO2== 
eh 
20h 


where 


K =|K| =kV2—2c0s0 = ky/2 — 2[1 —2sin2(6/2)] = 2k sin(@/2) , 
(5.6.21) 


and @ is the angle between the incident direction Z and the scattered direc- 
tion x. It is a special feature of the Born approximation for spherically sym- 
metric potentials that the scattering amplitude depends on k and 6 only in the 
combination K. 


Coulomb Scattering 
For an important example of the Born approximation, consider a shielded 
Coulomb potential 


Ze? 


Vn= ae exp(—Kr) . (5.6.22) 


This is a rough approximation to the Coulomb energy of a scattered particle of 
charge Z2e in the electric field of an atom whose nucleus has charge Ze. The 
full electrostatic potential of the nucleus is felt by the scattered particle when the 
particle is closer to the nucleus than the electronic orbits, taken to have typical 
radii of order 1/x, but the potential vanishes when the scattered particle is far 
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enough from the atom for the orbiting electrons to completely shield the charge 
of the nucleus. (This potential is also known as a Yukawa potential, because, 
as we will see in Section 7.3, in 1935 Hideki Yukawa(1907—1981) showed that 
the exchange of a meson of mass fix/c between two nuclear particles would 
produce such a potential, though of course with some other constant factor in 
place of Z;Z2e*.) Using this in Eq. (5.6.20) gives a scattering amplitude 
2mZ1 Ze 1 

he K2 Ae Kz? 
with K given by Eq. (5.6.21). We can find the scattering amplitude for a pure 
Coulomb potential by just taking « = 0 in Eq. (5.6.23). This result is only valid 
to first order in Z Ze, but a calculation of higher-order corrections shows that 
for k = 0 these higher-order corrections change the scattering amplitude only 


by a phase factor, which has no effect on the differential cross section (5.6.17), 
so in this case 


iQ Sa 


(5.6.23) 


do 4m?Z ae 

dQ nt K4 
which holds even beyond the Born approximation. This is the same as the 
formula calculated classically in 1911 by Rutherford, following the hyperbolic 
trajectory of the alpha particle to find the area do that it must hit to reach a 
given direction within a solid angle dQ. Rutherford’s calculation would not 
have given the correct scattering probability for a general potential, except at 
very short wavelength. It was just good luck that for Coulomb scattering the 
classical calculation gives the right answer for general wavelengths. 


; (5.6.24) 


Appendix: General Transition Rates 


So far we have considered only the scattering of a single non-relativistic particle 
by a fixed scattering center. Nature presents us with a much wider variety of 
processes, in which any number of particles coming together from large sep- 
arations in an initial state interact, producing some number of particles (not 
necessarily the same number) that then go out to large separations in a final 
state. These processes range from the decay of a single particle to the collision 
of any number of relativistic or non-relativistic particles, producing any other 
particles. This appendix describes a very general formalism for the calculation 
of the rates of all such processes. 
We consider a Hamiltonian of the general form 


H=H+V (5.6.25) 


in which the two terms are distinguished by the condition that the eigenfunctions 
of Ho represent states of free particles, such as those that are present long before 
or long after a collision, while V is an interaction that becomes negligible when 
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these particles are very far apart. For instance, for non-relativistic processes Ho 
is the operator representing the total kinetic energy. The eigenfunctions @y of 
Ho satisfy 


Hoga = Eafe - (5.6.26) 


Here a labels the species, three-momenta, and spin z-components (or helicities) 
of all the particles in the state a represented by gy, and Ey is the sum of 
the kinetic plus mass energies of these particles. These wave functions can be 
normalized so that 


/ Pp Pa = 5(B—a), (5.6.27) 


with the understanding that 6(6 — a) vanishes unless the numbers of particles 
in the states w and and the species and spin components of the corresponding 
particles in these states are all equal, and where they are equal it is given by 
a product of Dirac delta functions for the three-momentum of each particle. 
(In Eq. (5.6.27) we continue to use the abbreviation, that in f Pp Pa we inte- 
grate over all coordinates and sum over all spin 3-components on which both 
wave functions depend.) To be explicit, for wave functions representing free- 
particle states containing respectively N and N’ particles, we have 


Q,, g 
! ol p’:..cn! 1 wy N1,0],P13...1N.ON, 
1,07 PY 3-1 yr Piys 1,01,P1 N;,ON>PN 


a ON'NS ot or tg 85! on x Sn! ny _ Sn! ny 


x 5°(p, — pi) ---8° (Py — Pw). 

with the ns labeling species and the os labeling spin z-components or helicities. 
(For identical bosons or fermions it is necessary to respectively symmetrize or 
antisymmetrize the products on the right-hand side.) We seek to calculate the 
probability that the interaction V will cause a state that looks at very early times 
like the free-particle state @ to look at very late times like some other free- 
particle state B. 

To pursue this calculation, we consider an eigenfunction py of the full Hamil- 
tonian (5.6.25) with energy Ey: 


Ay = Equa (5.6.28) 

We can incorporate this condition along with our initial condition in what is 
known as the Lippmann—-Schwinger equation”*: 

Wa = Ga + (Ex — Ho + ie)! Va, (5.6.29) 


24 B. Lippmann and J. Schwinger, Phys. Rev. 79, 469 (1950). 
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with € a positive-definite infinitesimal quantity that makes (Ey — Ho + ie)~! 
well-defined even though Ey is within the spectrum of eigenvalues of Ho. 
(The general reason for taking € positive will be revealed shortly.) Multiplying 
Eq. (5.6.29) with the operator Ey — Hp and using Eq. (5.6.26), we see that 
(Ey — Ho)Wa = VWa, So any Wa that satisfies Eq. (5.6.29) also satisfies 
Eq. (5.6.28). 

To check the initial condition, we need to consider the time dependence of a 
packet of the wave functions Wy. If we expand V Wq as an integral over free- 
particle wave functions gg and use Eqs. (5.6.26) and (5.6.27), Eq. (5.6.29) 
becomes 


S95 Vive 


_* “6 _"* _ gg, 5.6.30 
By Byie eee 


Wa = Pa + / dB 
where the integral over # includes an integration over all three-momenta in the 
state represented by yg and a sum over all species and spin labels on the particles 
in this state. The time dependence of a packet of these wave functions is given 
in the Schrédinger picture by 


Oa j 8(@) Yo dor = i gael Wa do 


: : 03 V Wa 

= —iEgt/h dot ew iEat/h J B 

[swe vada +f dp f gta) ae By Ep ie 
(5.6.31) 


where g(a) is some smooth function of the momenta that may also depend on 
the spin and species labels. It will be convenient to separate an integral over 
energy from the second integral over a, writing Eq. (5.6.31) as 


_j Bie _j Gp(E) 
(2) (4) — iEyt/h iEt/h B 
w = | ewe vada f dB on [ dE e He eae’ 
(5.6.32) 
where 
Gg(E) = [eo g(a) d(Ey — BE) f oj VW. (5.6.33) 


Now let us take t — —oo. Fort < 0 we can close the contour of integration 
over E in Eq. (5.6.32) with a very large semicircle in the upper half of the 
complex plane, on which the factor e~'#'/" makes the integrand negligible. 
The integral over E is then given by a sum of the residues of any singularities 
of the integrand in the upper half of the complex plane. There may well be 
such singularities, but for t > —oo their residues are exponentially suppressed 
by the same factor e~'¥‘/". A singularity infinitesimally above the real axis 
would not be suppressed in this way, but the energy at which the denominator 
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E — Eg + ie vanishes is just below the real axis, and so does not contribute 
to this contour integral. (This reveals why we took € to be positive.) Hence the 
integral over E vanishes for tf — —ov, so for very early times only the first term 
in Eq. (5.6.32) survives: 


y(t) > / g(aje Fat/ho da. (5.6.34) 


This is what we mean when we say that at very early times the state repre- 
sented by Wa looks like the free-particle state represented by gy, as was to be 
shown. 

What does this state look like at very late times? For f > O we can only 
close the contour of integration over E with a very large contour in the lower 
half of the complex plane, on which the factor e~'“‘/" is now negligible. 
The residues of any singularities of Gg(£) at a finite distance below the real 
axis are exponentially suppressed for t — +00 by the same factor. But now 
the singularity at E = Eg — ie does contribute to the integral. The contour 
of integration goes clockwise around this singularity, so this integral equals 
—2miGp(Ep — ie) exp([—iEg — €]t/h). As long as we take € — 0 before we 
take t —> +00, we can drop the € here, so the integral over E in Eq, (5.6.32) 
equals —27iGg(Eg) exp(—iEgt/h), and Eq. (5.6.32) then gives 


w®(t) > [ smrei™ Mo da —2ni f ap Gp(Ep)op exp(—iEgt/h) 


for t + +00. Using Eq. (5.6.33), this is 


Wv@(t) > g(a) da 7 dB Sgu exp(—iEgt/h)ge, , (5.6.35) 
where 
Spu = 6(B — a) — 20i8(Eg — Ea) / 05 V Wa - (5.6.36) 


So, in the same sense as in the case t — —oo, Eq. (5.6.35) shows that the 
state represented by Wy looks at t > +00 as a superposition [{ dB Sgayp. The 
coefficient (5.6.36) is known as the S-matrix and is the central object of study 
in modern scattering theory. 

But experiments do not measure probability amplitudes. They measure prob- 
abilities, or the rates at which probabilities change. However, we cannot just set 
the probability for the transition a — £ equal to [Spal Even if we consider 
a process for which a 4 #, so that we can drop the term 6(6 — @) in Sga, 
the S-matrix element will still be proportional to the energy-conservation delta 
function 6(Eg — Eq), whose square is not well-defined. Also, in the most com- 
mon case, where no external fields affect the transition a — /£, momentum is 
conserved, so 
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[5 Vila = 8°(Pg — Po) Mpa ; (5.6.37) 


where P here denotes the total momentum of the state and Mgq is some ampli- 
tude that is not singular when Pg = Py. So we have to worry about the square 
of 53 (Pp — P,) as well as the square of (Eg — Eq). 

For a completely convincing way of dealing with these problems, we would 
need to take superpositions of states with a range of energies and momenta, and 
follow the evolution of these wave packets from very early to very late times. 
We will adopt a much simpler approach that gives the right answers with a 
minimum of trouble. 

First, to deal with the inevitable energy-conservation delta function, we adopt 
the fiction that the interaction V acts only for a long but finite time interval of 
duration 7. This should not introduce significant errors if this interval extends 
back in time to long before the particles in state « become close to one another, 
and extends forward in time to long after the particles in state 8 have been close 
to one another.” In this case, the one-dimensional version of the representation 
(5.6.9) of the delta function becomes instead 


b7 (Eg — dt exp(—it(Eg — Ey)/h) , (5.6.38) 


Ey) === 

a) anh Jr 
the integral extending over the time interval of duration T. The square of the 
delta function is then 


2 T 
[57 (Eg — Ea) |” = 67 (0)67 (Eg — Ea) = (=) Or Eg = Ey) « 
As long as we do not attempt to measure energies to an uncertainty less than the 
tiny amount fi/7T, we can drop the subscript 7 on the final delta function, and 
write this as 


T 
[sr(Ep — Ew] = (=) S(Eg = Ey) (5.6.39) 

Likewise, in the absence of external fields momentum is conserved; to 
deal with the momentum-conservation delta function we imagine that the 
system is enclosed in a box of large but finite volume V. The representation 
(5.6.9) of the momentum-conservation delta function in Eq. (5.6.37) (now with 
momentum and position taking the place of position and wave vector) is then 
replaced with 


25 Fora decay process with a single-particle initial state we must take the duration T of the time interval 
sufficiently large that the interval extends back in time close enough to the time when the particle was 
produced, so that it had not yet had time to decay, and far enough forward in time that if the particle 
has decayed by then its decay products will have had time to separate far enough that they are no longer 
interacting. 
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1 


3p, __p.) = 
5)(Pp — Pa) = Gms 


/ d?x exp (ix- (Pg — Py)/h) , (5.6.40) 
V 


the integral running over the interior of the box. The square of this delta func- 
tion is 


[55s — P.)] = 53,(0)63,(Pg — Pa) = S3(Pg—Py), (5.6.41) 


(27 h)3 
in which we drop the subscript V in the final expression because the uncer- 
tainty in measurements of momenta is generally larger than the tiny amount 
hV—'/3. Hence, putting together Eqs. (5.6.36), (5.6.37), (5.6.39), and (5.6.41), 
the probability of a transition a — 6 with a ~ # occurring in a time T in the 
volume V is 


P(a > B) = [SP]? 


{7 \f 7 eo ee 
- (=) (a3) 5(Ep — Eq)5° (Pg — Pa)|20 Mge'|” . 
(5.6.42) 


A superscript “box” has been attached to the matrix elements Sgq and Mga 
because putting the system in a box changes the way that we must normalize 
the wave functions g and gg. Without a box, the wave function for a particle 
of momentum p far from any interaction is taken as @p(x) = exp(ip - x/h)/ 
(27 h)3/?, so that [ Pxoe (pp) = 5°(p — p’), but in a box of volume V 
we must instead take gp(x) = exp(ip - x)/ /V, so that the integral of ldp(x)|? 
over the volume of the box is unity. Thus the matrix element for the transition 
a — B ina box is related to the usual matrix element by 


7) 


b 
Mey. = 


Mga » (5.6.43) 


where N and Ng are the numbers of particles in the initial and final states. 
There is a further complication, that in a large box the final states are very close 
together. According to Eq. (5.5.1), the number of allowed momentum values for 
a single particle in a range d* p of momenta is (V/(2h)*)d? p, so the number 
of momentum states in the range of final states is 


dN (B) = (V/(2xh)>)%¢ dB (5.6.44) 


where df denotes a product of momentum-space volume elements d? p for each 
particle in the final state. Using Eqs. (5.6.43) and (5.6.44) in Eq. (5.6.42), the 
differential rate for transitions from an initial state aw into a range df of final 
states is 
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[Spex Pa.N (B) 
T 


3 vi \INe 
=(F) (aos) Mal? 8p ~ Ba)8%P a — Pe dB. 


dV(a > B) = 


(5.6.45) 


This is the master formula for calculating the rates for all sorts of transitions 
between free-particle states. 

The factor (V/(2xh)3)'"” in Eq. (5.6.45) may look peculiar, but it is in 
fact just what is needed to account for what is measured. For a decay process 
with Ny = 1 this factor is of course absent, corresponding to the obvious fact 
that the decay rate of a particle does not depend on the size of the box in 
which it is contained. For a two-particle initial state a, the differential rate 
of the scattering a — f into an arbitrary final state 6 is proportional to the 
flux, the product of the relative velocity uv, and the number density 1/V of 
either particle as seen from the other, and is therefore written as the flux times 
a differential cross section do (a — £). (For a pair of non-relativistic particles 
Ug = |pi/m1 — p2/mz|, while if one of the particles is a photon then ug = c.) 
Hence Eq. (5.6.45) gives 


dT(a—> B) _ (2n)*h? 
Ua/V 7 Ug 


|Mpa|” (Eg — Ex)5° (Pp — Pa) dp . 
(5.6.46) 


do(a—> B)= 


To clarify the meaning of the closing factor (Eg — Eq)5° (Pg — Py) df in 
Eqs. (5.6.45) and (5.6.46), consider a process a — f in the center-of-mass 
system, with Py = 0, where is a state of two particles with momenta p’, and 
p’; and masses m', and m’,. The closing factor in Eq. (5.6.46) is here 


5(Eg — Ey)5° (Pg — Py) dB = 6(E', + Ey — Ex)6°(p, + pg) dp, ds - 


When we integrate over the final momenta the momentum-conservation delta 
function directs us to set p’, = —p, = p, so 


5(Eg — Ey)5°(Pp — Pa) dB 
> p dp dQ 8((p?c* + mich)? ++ (pc? ++ migc*) 1/2 _ Ew) : 


where p is in the solid angle dQ. There is a general rule that since for an 
arbitrary increasing function f(p) which takes a value fo at a single point po 
we have | = f 5(f(p) — fo) df (p), it follows that 


5(f(p) — fo) = 8(p — po)/f' (po) - (5.6.47) 
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In our case, this means that when we integrate over p, we are directed to set 
P = pp, where 


(pper + m'ge*)'? + (ppc? + mgc*)'? = Eg , (5.6.48) 
and Eq. (5.6.46) becomes 
(27)*h? pa 2 
do(a > B) = Ta |Mga| dQ, (5.6.49) 


in which it is understood that, in the center-of-mass system, Mg, is to be 
evaluated by placing p’,= — p’, in the infinitesimal solid angle dQ, with 
Ip’,| = IP'gl = pg. and 


ppc? ppe? 


= + P 
(ppc? + mct)V/2 © (phe? + mRct) 2 


up (5.6.50) 
Of course in the center-of-mass system the initial relative velocity ug in 
Eq. (5.6.47) is given by similar formulas but with 6 replaced with a and 
the final masses m', and mp replaced with initial masses m4 and mg: 


2 2 
Pac Pac 
Ug = + : 5.6.51 
a (p2c? + m?,c4)1/2 (p2c? + m%,c*+)!/2 ( ) 
where 
(pee + me ye + (prc? + macy? a ee (5.6.52) 


We can now see how our earlier results for scattering by a fixed potential 
emerge from this general formalism. Consider an elastic non-relativistic scat- 


tering process, in which m', = ma = mand m, = mg > m. In this 
Case Py = Pp, Ua = Ug = Po/m, and Ey — mace — mpc? = pe /2m. 
Equation (5.6.49) then gives the differential cross section 

da(a — B) 


= (29)*h?m? |Mpo|” . (5.6.53) 


dQ. 


To calculate the matrix element Mgq, we note that the final free-particle wave 
function is 


(2xh)3/2 (20h)3/2 


and in the center-of-mass system the initial interacting wave function takes the 
form 


p(XA, XB) = 


1 
(27h)3/2 ’ 


where w is the wave function discussed in the main body of this section (which 
already includes a normalization factor (2nh)~3 / , and the second factor takes 


Wa(Xa, XB) = W(KA — Xp) X 
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care of the normalization of the heavy particle wave function. Then, setting 
X4 = xX+ xz and integrating over xz, 


/ 03V Va = / Y3(%A,XB)V (XA — XB) Wa(%A.XB) dx 4 d?xp 


—ip), x/h 
= wis + pp) f xe Onn’ O¥®) 
so 
eR x/h 
Mba = fe * Onn? ae OS (5.6.54) 


Using Eq. (5.6.54) in Eq. (5.6.53) gives the same differential cross section 
(5.6.17) as found earlier. 

It is frequently observed that the cross section for some reaction is a function 
of energy with a sharp peak. This is a sign of a resonance, the formation of 
a slowly decaying intermediate state in the scattering process. Suppose the 
integral f WeV Wa in Eq. (5.6.36) for the S-matrix has a term with an energy 


dependence proportional to (Ey — Er + iV f/2)~'!, with Ep and I real and 
I’ > 0. This yields a term in the function Gg(E) defined by Eq. (5.6.33) with 
energy dependence proportional to (E — Er + ih’ /2)~!, which has a pole in 
the lower half of the complex E plane. Although, as noted in the derivation of 
Eq. (5.6.35), the contribution of any singularity in Gg(£) at an energy E ata 
finite distance below the real axis vanishes for tf — +00, if the singularity is 
close to the real axis then this contribution lasts a long time. So if I" is rela- 
tively small then the integral over E in Eq. (5.6.32) contains a term that decays 
slowly, with a time dependence proportional to exp(—iErt/h) exp(—T't/2), 
giving a term in |y‘)(t)|? that decays as exp(—I's), indicating the presence 
of an intermediate state whose probability decays at a rate I. The singular term 
in f Vp Vw gives a term in the cross section with energy dependence 


1 pe 1 
E-—Er+ihY/2|  (E—Ep)?+A12/4 7 
So, this is the general rule for resonances: the decay rate I’ of the intermediate 


state is the full width in energy of the resonant peak in the cross section at half 
maximum, divided by h. 


ox 


(5.6.55) 


5.7 Canonical Formalism 


Until now we have followed de Broglie in representing the momentum of 
a particle as —if times the gradient with respect to the particle’s position, 
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so that the wave function representing a state with definite momentum p is 
ox exp(ip - x/f). From this, we obtained the commutation relation among the 
operators X and P that represent position and momentum, for instance, for a 
single particle, 


[X;, Pj] =ihd;;, [Xi, Xj] = [P).Pj)=90, (3) 01) 


where in the Heisenberg picture P = mX. This has been adequate in deal- 
ing with charged particles moving in an electrostatic potential but not in more 
complicated contexts, such as the case of charged particles moving in general 
classical electromagnetic fields, discussed in the next section, much less for a 
quantum theory of fields. Also, in using commutation relations like Eq. (5.7.1), 
we must wonder (or at least we should wonder) why these relations are valid. 


Hamiltonian Formalism 


There is a more general approach, known as the canonical formalism, according 
to which the continuous degrees of freedom (excluding spin) of any system are 
represented by a set of canonical variables Q, (such as all the components of 
the positions of all the particles in a system) and an equal number of “canonical 
conjugates” P,. Like any operators, in the Heisenberg picture these operators 
satisfy the equations of motion (5.3.34): 


ad ~~ 
iN Oat) =(Qu(t), HI], ih Pa(t) = (Pat), HI, (5.7.2) 


where H = H (Q(t), P(t)) is the Hamiltonian of the system. On the basis of 
previous experience with classical phenomena, we commonly need to require 
that these equations of motion take the same form as the Hamiltonian equations 
of motion in classical mechanics: 


d 
5 Qalt) = H(Q(t), PO), C79) 


0 Pa(t) 


d 
Wiad) =~ H(Q(t), P(t) . (5.7.4) 


0 
dQa(t) 
For instance, for a particle of mass m in a potential V(X), the variables Q, are 
the components of the position vector X, the Hamiltonian is 


p2 
H(X,P) = — + VQ), 
2m 
and the equations of motion (5.7.3) and (5.7.4) are 
d P d 


dt m dt 
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as in Newtonian mechanics. In order to guarantee that the equations of motion 
(5.7.3) and (5.7.4) follow from the equations (5.7.2) of the Heisenberg picture, 
we impose the canonical commutation relations 


[Qa(t), Po(t)] = ihdap, [Qalt), Qo@)] = [Palt), Po] =O. (5.7.5) 


To see that this works, recall that as remarked in Section 5.3 commutation 
is algebraically like differentiation. It follows from the commutation relations 
(5.7.5) that for any function F(Q, P) of the Qs and Ps, 


; a 
[Qa(t), F(Q(T), P(t))| = ihe pt (om). P(t)) , (5.7.6) 
; a 
PhO PLOT), P(t))| = hag pt (om). P(t)) . (Ste7) 


So, by taking F = # it follows trivially from the Heisenberg picture equations 
(5.7.2) and the commutation relations (5.7.5) that the Qs and Ps satisfy the 
Hamiltonian equations of motion (5.7.3) and (5.7.4). This is why we impose 
these commutation relations. 

Of course, since operators in the Heisenberg and Schrédinger pictures are 
related by Eq. (5.3.35), the commutation relations for the Schrédinger-picture 
operators Q, and Py, are the same as for the Heisenberg-picture operators Qa (ft) 
and P,(t). 

It is in order to satisfy the canonical commutation relations (5.7.5) that in 
wave mechanics we represent the momentum vector by the operator —ihV. 
What for de Broglie and Schrédinger was just a guess is a necessary conse- 
quence of the canonical formalism. But there are cases where the canonical 
conjugates P, are not simply masses times velocities but take a different form, 
as dictated by the Hamiltonian equation (5.7.3). In such cases, it is the quantities 
P, and not masses times velocities that must be represented as gradients. 

For instance, consider a particle that experiences a momentum-dependent 
interaction, with Hamiltonian 

| a | 1 


H= in + 5P - V(X) + gv) -P, (5.7.8) 


where V is some vector function of position. (Since P; does not commute 
with X;, we need to average over orderings of P and V(X) in order for the 
Hamiltonian to be self-adjoint.) Here Eq. (5.7.3) tells us that the momentum is 
not just the mass times the velocity, but instead 


P(t) =m Gx - vex) | ; (5.7.9) 


Nevertheless, it is P and not mdX/dt that must be represented in wave 
mechanics by —ifV, in order to satisfy the first commutation relation (5.7.5). 
In particular, the time-dependent Schrddinger equation here reads 
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vee th= h v2 t iny Vv t ihy V t 
i a = a W(x, ft) — 7 -[V(x) W(x, t)] — ee (x) - V(x, ¢) . 
(5.7.10) 


Lagrangian Formalism 


There is another version of the canonical formalism, in quantum mechanics as 
well as classical mechanics, based on a Lagrangian L(Q, Q) taken as a function 
of canonical variables Q,(t) and their time derivatives Q,(t) rather than a 
Hamiltonian function of canonical variables and their canonical conjugates. The 
fundamental assumption of the Lagrangian formalism is that a quantity known 
as the action 


+00 

r= | L(Q(t), O(t)) dt 7115 
—0OoO 

is unaffected by infinitesimal shifts in the functions Q,(t) that vanish at 

t—> + oo. To use this assumption, note that when Q,(t) is changed to 

Qa(t) + 6Qa(t) with 6 Q,(t) infinitesimal, the change in the action is 


= FT aL(Q(t), O(t)) aL(Q(t), O(t)) d 
=> | Ou oo GO sr8 200 | dr. 


In the case where 5Q,(t) vanishes at t — too, integrating the second term in 
the integrand by parts gives 


+oo ; 4 
sr= | ane QO) _ @ AL(Q@), 9®) 
~~ J 00 0 Qa(t) dt = dQ,(t) 


exo dt , 


and since this is assumed to vanish for arbitrary variations 6 Q,(t) that vanish at 
t — +00, we must have 
d (Hee. 20) _ aL(Q@), QW) 
dt dQa(t) dQa(t) 


These are the equations of motion in the Lagrangian formalism. 
From this, we can go over to the classical Hamiltonian formalism, defining 


p(t) = BL2O. O(t)) 
“a Qalt) 


(5.7.12) 


(5.7.13) 


with Hamiltonian 


H(Q,P)=)° QaPa — L(Q, Q). (5.7.14) 
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(Taken literally, this may not put the Qs and Ps in the right order for H to 
be self-adjoint, in which case we must average over their ordering to make H 
self-adjoint as we did in Eq. (5.7.8).) In Eq. (5.7.14) we should regard Oasa 
function of the Qs and Ps, given by solving Eq. (5.7.13) for @. We can then 
check that the Qs and Ps satisfy the Hamiltonian equations of motion 


dH (Q, P) 2Cb p _ aL(Q, Q) dL(Q, QO) AO, 
dQa = 039, dQq dX dO» dQq 


__ 90,0) __ 
“30,4 


and 


dH(O,P F) aL(Q, Q) AG ; 
EOP) Ob 4 G5 — FOIE - 6, 
b a 


as was to be shown. 


Noether’s Theorem 


The chief reason for using the Lagrangian formalism to construct a Hamiltonian 
is that there is a deep relation between conservation laws and symmetries of 
the Lagrangian, first stated in classical physics?© by Amalie Emmy Noether 
(1882-1935). Let us consider a symmetry of the Lagrangian under an infinites- 
imal transformation that for simplicity takes the Qs into functions of Qs: 


afa 
Qn + Ont ful). On» On +e or, 715) 


where the f,(Q) are some functions only of the Qs that are dictated, up to a 
constant factor, by the nature of the symmetry principle, and € is an infinitesimal 
parameter. (Time-independent rotations and translations of coordinates are of 
this general form.) The invariance of L under this transformation tells us that 


OL oL d 
cs 70,142) + pe 50, a7t8 


Using Eqs. (5.7.12) and (5.7.13), we see that this is a conservation law: 


dF(Q, P) 


7 =0 where F(Q,P)= ) | PafalQ). (5.7.16) 


26 B, Noether, Nachr. K6nig Gesell. Wiss. zu Géttingenm Math.-Phys. Klasse 235 (1918). 


5.8 Charged Particles in Electromagnetic Fields 195 


Not only is F conserved — in quantum mechanics it generates the symmetry 
with which we began, in the sense that 


[F, Qa] = —ih fa(Q) (9.7.17) 
or, equivalently, for infinitesimal e, 
exp [ie F/h] Qa exp[—ie F/h] = Qa + €fa(Q) , (5.7.18) 


which is just the transformation (5.7.15). 

For instance, if we take the canonical variables Q as the ith components Xj; 
of the coordinate vectors X,, of particles distinguished by a label n, and if as 
usual the Lagrangian for a multi-particle system depends only on velocities and 
differences of coordinate vectors, then Z is invariant under the transformation 
Xni > Xni + €i, With the same infinitesimal vector € for each particle label n, 
and Eq. (5.7.16) gives a conserved quantity, 


P= ye. 
n 


This of course is the total momentum, and generates the translation symmetry, 
in the sense that 


IG . P, Xnj | => —ihe; : 


A similar analysis uses the assumed rotational invariance of the Lagrangian to 
give the usual formula for the total angular momentum of any system that does 
not involve spin. But note that invariance under the Galilean transformation 
X — X-+ ut does not lead to a conservation law because, unlike translation or 
rotation, this transformation involves the time. 


5.8 Charged Particles in Electromagnetic Fields 


We now turn to the quantum theory of a charged particle moving in classical 
electric and magnetic fields. This theory will provide us in this section with 
a good example of the use of the canonical formalism, and as we will see in 
the following section this theory played an important part in understanding the 
effect of external magnetic fields on atomic spectra. 


Scalar and Vector Potentials 


It is frequently convenient in classical electrodynamics to write the electric and 
magnetic fields as linear combinations of derivatives of a vector potential A(x, ft) 
and a scalar potential @ (x, t): 


1. 
E-—-A-—vo, B=VxA. (5.8.1) 
Cc 
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This ensures that the fields satisfy the homogeneous Maxwell equations 
VxE+B/c=0, V-B=0, (5.8.2) 


and leads to simplifications in the other Maxwell equations. 

What in classical physics is merely a convenience, in quantum mechanics is 
a necessity. It is not possible to write a simple local Hamiltonian for a charged 
particle in general electric and magnetic fields using just the fields E and B. But 
we can write such Hamiltonians in terms of A and @. For a single non-relativistic 
particle of mass m and charge e, the Hamiltonian is 


1 e 2 
H(%P) = = [P — =A, 1)| ~ eb (X,t). (5.8.3) 


Whether or not we derive this Hamiltonian from a Lagrangian, its real justi- 
fication is that it leads to the correct equations of motion. The Hamiltonian 
equations of motion (5.7.3) and (5.7.4) here take the form 


. OH 1 e 

Xi = say = LO A'%O] 

, OH ep, ie, aAj(Xt)  ab(X1) 
Se me [Pie “Aj&0| xm oa, 


where the indices 7, j, etc. run over the values 1, 2, 3, and repeated indices are 
summed. Eliminating the momentum from these two equations (and dropping 
arguments), we have an equation of motion for the position: 

0A; 0 dA; . OA; 
ce — p a bu : i 

0X; OX; c| ot ax j 

e. 0A; 0A; Clo) e oA; 
= -X;j — —e : 

OX; ox j OX; c ot 


mX; 


To put this in a more familiar form, note that 


. (dA; dA; . 
Xj a =[Xx(VxA)].. 
0X; OX; : 
(For instance, for i = 3 the left-hand side is 
dA\ 0A3 0A2 0A3 P ' 
xX;{— - — Xo { — —- —) = X1(V x A)n— X20(V XA 
(oe om) + (TS =) i(V x A)2 2(V x A); 


— [x x (V x A)], 


and likewise for i = 1 andi = 2.) Using the formulas (5.8.1) for E and B, the 
equation of motion takes the form 


mX = cE + “[X xB], (5.8.4) 
Cc 
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which we recognize as the equation of motion (4.6.23) dictated by Lorentz 
invariance, to first order in |X|/c. 


Gauge Transformations 


There is more than one set of potentials A and ¢ that give the same fields E and 
B. Given a set of potentials A and ¢ that yield a set of fields E and B, we can 
always find other potentials 

A*=A4+VE, ee ee (5.8.5) 

c ot 

which give the same fields for an arbitrary function &(x,t). A given choice of 
potentials is called a choice of gauge, and Eq. (5.8.5) is known as a gauge 
transformation. Even though the equation of motion (5.8.4) derived from the 
Hamiltonian (5.8.3) involves only the fields E and B, the Hamiltonian depends 
on A and ¢ and is not gauge invariant. So it is important to observe that no 
physical implications of this Hamiltonian depend on the choice of gauge. 

Let us check this for the simple case of a time-independent gauge trans- 
formation function §(X), which has no effect on ¢. The gauge-transformed 
Hamiltonian is 

Ha [P ee ee ‘ve] oe (5.8.6) 
2m c c 
Define an operator 


U(X) = exp (-<e00) : 
Cc 
According to Eq. (5.7.7), 
[P, U(K)] = —<vECOUK) 
and therefore 
U-!(X)PU (X) =P — ~VE(X) 
It follows that 


H*(X,P) = U7! (X)H(X,P)U(X) . G27) 


So if (x) satisfies the time-independent Schrédinger equation Hy = Ew for 
energy E, then the gauge-transformed Schrédinger equation H*y* = Ey* is 
satisfied for the same energy, with gauge-transformed wave function 


Ce oe 
wx) = exp (—-—-§@) ) ¥@). (5.8.8) 


Not only the energy but also the probability density |y|? is unchanged by this 
transformation. 
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Magnetic Interactions 


Now let us take the simplest example of magnetic interactions, a one-electron 
atom in a uniform time-independent magnetic field B. We can take the vector 
potential here as 


1 
A=--~-XxB, 
2 


for which V x A = B. Of course this is not unique, but as we have seen this 
makes no difference. 

The factor 1/c multiplying the vector potential in Eq. (5.8.3) makes the mag- 
netic term in the Hamiltonian generally very small. To first order in this term, it 
shifts the Hamiltonian (5.8.3) by 


° A(X) -P = —-—"-[X x B]-P= — 
7 = 5 x i = 5 


MeC MeC MeC 


AH = 


B-L, (5.8.9) 


where L = X x P is the orbital angular momentum operator. (Here e has been 
changed to —e, because in the usual notation this is the charge of the electron. 
Also, we have not had to worry about the order of the operators A(X) and P, 
because in this choice of gauge, V - A = 0.) 


Spin Coupling 


What about spin? The form of the interaction (5.8.9) suggests that there should 
also be a similar term in the magnetic interaction Hamiltonian with the spin 
operator S in place of L, and not necessarily with the same coefficient. The 
magnetic interaction is therefore taken to be in the form 


AH =———Bs[L +28), (5.8.10) 
2Mec 


é 
where ge is a dimensionless coefficient known as the gyromagnetic ratio of 
the electron. It was first calculated in 1928 on the basis of a relativistic theory 
of the electron by Dirac,”’ who found the value g. = 2. The development of 
quantum electrodynamics after World War II led to a calculation? of a radiative 
correction due to the emission and reabsorption of a photon by the electron 
while it is interacting with the magnetic field. This gave ge = 2 x 1.00162, in 
good agreement with experiment. 

The effect of the interaction (5.8.10) on atomic energy levels in a magnetic 
field is described in the next section. 


27 P A.M. Dirac, Proc. Roy. Soc. A 117, 610 (1928). 
28 J. Schwinger, Phys. Rev. 73, 416 (1948). 
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5.9 Perturbation Theory 


There are few problems in quantum mechanics that can be solved exactly. For- 
tunately it is often possible to find useful approximate solutions by a technique 
known as perturbation theory. Sometimes it happens that the results obtained 
in this way are more revealing than would be provided by a more complicated 
exact solution, even where one is available. 

The basis of perturbation theory is the assumption that the Hamiltonian can 
be divided into two parts: 


H=H+H', (5.9.1) 


where Ho is simple enough to allow exact solutions of the Schrédinger equation, 
and H’ is in some sense small. We have already used a Hamiltonian of this type 
to derive the Born approximation for scattering amplitudes in Section 5.6. In 
this section we shall concentrate on deriving approximations for energy levels 
and the corresponding wave functions, assuming that H’ is small enough to 
allow the eigenfunctions and eigenvalues of H to be usefully expressed as power 
series in H’. That is, in the Schrédinger equation Hy = Ew we write 


W=VWotWit wt, E=Eot£it+£ot+::-, (5.9.2) 


where yy and Ey are of Nth order in H’. The Schrédinger equation then takes 
the form 


(Ho + H’\(yo + Wi + 2 +-°-) 
= (Eo +E, + £o+-:- )\Wotwhtwet:::). (5.9.3) 


In the Nth order of perturbation theory we keep all terms in Eq. (5.9.3) up to 
Nth order in H’. To zeroth order in H’, this is the unperturbed Schrédinger 
equation 


Aoyo = Eowvo . (5.9.4) 


whose solutions we assume are known. 


First-Order Perturbation Theory 


Keeping only terms in Eq. (5.9.3) of first order in H’ and taking Wo to satisfy 
the zeroth-order equation Eq. (5.9.4), the Schrddinger equation becomes 


How + H’Wo = Eo + E:vo - (9.95) 


To find the first-order term £, in the energy, multiply Eq. (5.9.5) with wo and 
integrate and sum over all coordinates and spin 3-components. Because Ho is a 
Hermitian operator, we have 
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[ viron = [Govern = £0 f vin. 
so the terms in this integral involving y cancel, and we have 
Ex | vevo= [ veH'W, 
or, if yo is normalized, 


f= / WeH' Wo - (5.9.6) 


Very nice, but this does not necessarily work in the case where Ep is a 
degenerate energy eigenvalue, with several independent eigenfunctions yy”): 


How” = Egy™. (5.9.7) 


It is convenient to choose these eigenfunctions to be orthonormal: 
[OU = bam (5.9.8) 


Multiply Eq. (5.9.5) with any of the w™*, integrate, and sum over all coor- 
dinates and spin 3-components, and again use the fact that Hp is Hermitian, 
so that f w™*How = Eo f w™*. The terms in this integral involving ¥ 
again cancel, and we have 


[vont =e f vv. (5.9.9) 


The difficulty is that with more than one independent solution yy of Eq. (5.9.7), 
whatever we choose for our unperturbed wave function Wo, we can always 
choose some linear combination >, caw of these eigenfunctions to be 
orthogonal to Wo, in the sense that f ea cnv™ |" Wo = 0, so that the same 
linear combination of Eq. (5.9.9) gives a condition on H’: 


/ bs ov H'W=0, (5.9.10) 


which in general need not be the case. 

To avoid this contradiction, we must make an appropriate choice of the 
zeroth-order eigenfunction wo. What we need is to choose Wo so that any linear 
combination of the degenerate wave functions yy”) that is orthogonal to Wo will 
also be orthogonal to H'wWo. Because H’ is a Hermitian operator, the integrals 
Han = { Ww ™* H' Ww” form a Hermitian matrix, in the sense that A = tome 
According to a general theorem of matrix algebra, it is always possible 
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to replace the yw”) with linear combinations for which the orthonormality 
condition (5.9.8) is still satisfied, and now Hy» is diagonal: 


Ey n=m 
(n)* Ly/,7.(m) __ n 
fv Hw 3 O nee, (5.9.11) 


for some real €,. We must take the zeroth-order solution to be one of these 
redefined eigenfunctions, say w”, so that if we multiply Eq. (5.9.5) with 
the complex conjugate of any linear combination ee cnw™ of the other 


degenerate eigenfunctions that is orthogonal to y”, Eq. (5.9.11) implies that 
Eq. (5.9.10) is also necessarily satisfied, and there is no contradiction. (We will 
see an example of this procedure in our treatment below of the Zeeman effect.) 
With the zeroth-order wave function Ww = ofp A), Eq. (5.9.6) gives Fy = Ey). 

We can get a further insight into the necessity of a suitable choice of the 
zeroth-order wave function by considering a problem of some importance in its 
own right, the calculation of the first-order contribution to the wave function. 
Let us introduce a complete orthonormal set of solutions g, of the zeroth-order 
Schrédinger equation 


Hoga = Eaga » [ v0. = 0gb- (5.9.12) 


Multiply Eq. (5.9.5) by g and integrate and sum over all coordinates and spins. 
Since Hp is Hermitian the first term gives { y* How = Ea { 1, and so 


(Ey — Ey) / gin = i; gt — Ei / otto. (5.9.13) 


For E, = Eo, Eq. (5.9.13) makes no sense unless { 9 H'wWo vanishes for every 
such wave function orthogonal to yo, which is accomplished by taking wWo to be 
one of the wave functions Wy”) for which Eq. (5.9.11) is satisfied. On the other 
hand, for Eg # Eo, Gq is orthogonal to Wo so Eq. (5.9.13) gives a formula that 
is valid for any ¢q for which Eg # Eo: 


* A! 
[oon = ee for Eq # Eo. (5.9.14) 
0— a 


In the case where the eigenvalue Eg of Ho is not degenerate, Wo and the 
functions g, with Eg ~ Eo form a complete set, so we can expand yy as 


_ ae J ec H'Wo 
v1 =avot+ > vo | vin = ao + > er a 
a:Ea#Eo a:Ea#Eo 
with the complex number @ the only component of w that is still unknown. 
We can always take a@ to be real, because any change in the imaginary part of 
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a needed to make a real has no effect on wo + yy if it is compensated by a 
first-order change in the phase of Wo, which we are free to choose as we like. 
With a real, to first order the norm of Ww + YW is 


[ivo+ w= +24) f iol? 


So, if we normalize wo and require the wave function to remain normalized in 
first order, then we must have a = 0. The first-order shift in the wave function 
is then finally 


* HY! 
w= gf (5.9.15) 


a 
a:E,#Eo Eq — Ea 


Note that if the parameters of the theory are changed so that one of the Ey 
approaches Ep, then the corresponding component of the wave function 
becomes very large, invalidating perturbation theory, unless in this limit 
| GX H'Wo becomes very small. So even approximate degeneracy can be a 
problem. 

In the case of degeneracy Eq. (5.9.13) tells us nothing about the components 
of w; along the g, with E, = Eo, and the normalization condition on Wo + 
w, does not determine these components either. For this, it is necessary to 
invoke the condition that the changes of the wave function in higher orders of 
perturbation theory are small. We will not pursue this aspect here. 


Zeeman Effect 


For an example of the use of perturbation theory, let us return to the Zeeman 
effect, mentioned at the end of the previous section. Here Ho is the Hamilto- 
nian of an alkali metal atom, considering the outermost electron to move in an 
effective potential arising from the charges of the nucleus and all other electrons, 
with no external fields. To calculate the effect of a weak external magnetic field 
B, we consider a first-order perturbation given by Eq. (5.8.10): 


e 
2MeC 


H'= 


B-[(L+ gS], (5.9.16) 


where ge ~ 2 is the gyromagnetic ratio of the electron. The eigenfunctions of 
Ho may be labeled Wnjem. Here 


Vunjem =W 7G + Dwnjem UL? vnjem = WOE + Dnjem 
JeWnjem =hMwnjem » (5.9.17) 


where M runs by unit steps from —j to +j, and n — ¢ — 1 is the number of 
nodes of the wave function. The states with a given n, j, and € but varying M 
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all have the same energy, so the eigenstates of Ho are all degenerate, except for 
those with j = 0. For a magnetic field in an arbitrary direction the operator H’ 
in general includes terms proportional to L, and S,, which do commute with 
J, but also Ly, Ly, S,, and Sy, which do not commute with J, so there will be 
non-vanishing components of f Vij om ‘Wn jem with M ' M, and first-order 
perturbation theory will not work if we take the zeroth-order wave function to 
be one of the Wnje Me? 

The cure is obvious. Take the zeroth-order wave function to be an eigenstate 
of the component of J in the direction of B. Or, to save writing, just continue 
to use the Wpje as zeroth-order wave functions but from the beginning choose 
the coordinate system so that the z-axis is in the direction of B. In this case, the 
first-order shift in the energy is given by Eq. (5.9.6) as 


. eB 
ExinGiM) = 5 J ijey Le + 8eS)Vapea (5.9.18) 


It is easiest to evaluate E; for s-wave states with € = 0, for which 7 = 1/2 
and M = +1/2. In this case Eq. (5.9.18) gives immediately 


egeBh 


Ex\(n 0 1/2 £1/2)=+ (5.9.19) 


Amec 


To deal with the general case with € ~ 0, we use a general property of angular 
momentum multiplets. Let gj be any multiplet of 27 + 1 wave functions, 
with Vojm = wriIG + 1)gjm and J-gjy = hMgjm, formed as described 
in Section 5.4 by letting lowering operators J, — iJy act on a state with M = 
j. For any vector operator V, the integrals / Q; mw Vi?jm can all be calculated 
from any one of them by using the commutation relations of the raising and 
lowering operators J, iJ, with the V; and the effect of these operators on the 
multiplet gj, none of which depends on the choice of the operator V or the 
wave functions gj, so in general the integrals { y jm’ Vigjm can depend only 
on the specific choice of the operator V or the wave functions gj; through an 
overall factor. In particular, we have 


| iu Vieia a av f ofan toia , (5.9.20) 


29 If it were not for the fine structure produced by spin-orbit coupling there would be an additional 
degeneracy: the energies for states with the same n and @ but different j would be equal. The discussion 
here of the Zeeman effect assumes that the magnetic field is sufficiently weak that the energy shift it 
produces is small compared with the fine-structure splitting, in which case states with the same n and 
but different 7 are not effectively degenerate. But we are ignoring the even smaller hyperfine energy shifts 
due to the interaction of the electron with the magnetic field of the nucleus. 

In hydrogen there is a further degeneracy of states with the same n and j but different 2, such as the 
281/2 and 21/2 states, which are separated only by the very small Lamb shift described in Section 5.4. 
The treatment here applies to hydrogen only when the energy shift due to the interaction of the electron 
with the external magnetic field is less than the Lamb shift but greater than the hyperfine splitting. 
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where the factor wy will in general depend on the nature of the operator V 
and the wave functions gj, but not on the vector index i nor on the angular 
momentum z-components M and M’. This is an example of a general quantum- 
mechanical result known as the Wigner—Eckart theorem.>9 

In its application to the Zeeman effect, Eq. (5.9.20) gives 


J Vinca ivnica _ cng) f Vijeu Sinica , 


/ VijemSiVnjem = as (nj) / Wrjem JiVnjen - (5.9.21) 


To calculate the coefficients wy and as, we use a trick. The wave functions 
JkWnjem are linear combinations of the wave functions W,jey” in the same 
multiplet, so we can apply Eq. (5.9.21) also to these functions: 


| Wises biden = cxinje) | WijemJitkWVnjem - 
(5.9.22) 


| Wijen Sense — as(nje) f Wien tide Vapea 


Taking the wave functions Wj to be orthonormal, we have 


| Vremd Vnjem = Wh j(j+ how . 
Hence, setting i = k and summing over 7 in Eq. (5.9.22), we have 


WAG + DaniniO) = f Weyl Wien 


iG + las(njl) = / Vijem'S * IWnjem . 
Note that 
1 1 
L-J=5[-J-L?+ P+] =5[-8 +P +0] 


and likewise 


§-J=5[-+P+8], 


SO 


. —3/4+jUG+)+le+)) 
a (njl) = G+) 
=e hI bse 3/4 


as(njl) = 7G +1) : (5.9.23) 


30 For a statement of this theorem and a detailed proof, see Section 4.1 of Weinberg, Lectures on Quantum 
Mechanics, listed in the bibliography. 
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Using Eqs. (5.9.23) and (5.9.21) in Eq. (5.9.18) then gives the first-order Zee- 
man energy shift: 


E(n€jM) 
eBhM 


= meee 3/44 fGtD+lE4+)4eel-€€+1)4+/G +) 4+3/4]] . 


(5.9.24) 


Second-Order Perturbation Theory 


In some cases the interesting effects of a perturbation H’ arise only in second or 
even higher order. The terms in the Schrddinger equation (5.9.3) of second order 
in H’ give 

HoW2 + A’ = Ego + Evi + Ero. (3.9.25) 


To find £2, multiply with yg and integrate and sum over all coordinates 
and spins. Again using the fact that Ho is Hermitian, we have f Wo Aov2 = 
Eo f Wo W2, So the terms involving y2 cancel. Also, as we have seen, the nor- 
malization condition for w requires that [ Wo ¥1 = 0, so the term proportional 
to E, vanishes. This leaves 


E, = i: We W . (5.9.26) 


In the case where the eigenfunction of Ho with energy Eo is not degenerate, we 
can use Eq. (5.9.15), so that E> is given by a sum over all the other eigenfunc- 
tions of Ho: 
2 
* A! 
ne > [fgaH' pol" 


5927 
Eo E, ( ) 


a:Eqg#Eo 


When field theorists say that the Lamb shift is due to the emission and re- 
absorption of a photon by the electron in hydrogen they mean that this is a 
second-order effect, in which the wave functions g, in Eq. (5.9.27) represent 
states containing an electron and a photon. Since these states form a continuum, 
the sum over states involves an integral over the photon momentum, which 
introduces infinities into the calculation. This calculation was completed only 
in 1949, when it was recognized that the same second-order processes require 
a redefinition of the mass and charge of the electron and of the photon and 
electron fields, which leads to a cancellation of infinities.3! 


31 N. M. Kroll and W. E. Lamb, Phys. Rev. 75, 388 (1949); J. B. French and V. F. Weisskopf, Phys. Rev. 75, 
1240 (1949). 
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5.10 Beyond Wave Mechanics 


Our discussion of quantum mechanics in this chapter has so far been based 
on wave mechanics, in which physical states are represented by functions of 
particle positions and spins. This is too parochial a formalism. Why position, 
among all observable physical quantities? Indeed, we have already seen in 
Section 5.3 that a physical state can just as well be represented by a wave 
function depending on momenta (such as (5.3.20) for a one-particle system) as 
by a wave function depending on position. 

The study of other physical systems forces us much farther away from wave 
mechanics than merely substituting momenta for position as the argument of 
wave functions. The state of a field, such as the electromagnetic field, cannot 
be described in terms of the positions or the momenta of any fixed number of 
particles. It is partly as a preparation for our account of quantum field theory in 
Chapter 7 that we need to consider a formulation of quantum mechanics, due 
chiefly to Dirac,** that is general enough to apply to any physical system. 

In this general formulation, physical states are represented by state vectors 
in an infinite-dimensional space, known as Hilbert space. Like ordinary vectors 
in three dimensions, a linear combination a; V; + a2W2 of two state vectors VY; 
and W is also a state vector, only here the numerical coefficients a, and a2 can 
be complex. Addition here has the same properties as the addition of complex 
numbers, including associativity and commutativity and the existence of a zero 
for whichO + Y = YW +0 = YW. Also, as in Euclidean space, for any two 
state vectors W and © there is a scalar product denoted (®, VY), here a complex 
number, with the properties 


(Y,d) = (0, )*, (5.10.1) 
(D, ay Vy + aoW2) = ay (®, VY) + an(®, V2) , (5.10.2) 
(Y,Y) >0 (5.10.3) 


and (W,W) = 0 if and only if ¥ = 0. As we shall see, wave functions are 
the components of these state vectors in one basis or another, and the integrals 
(5.3.8) of products of these wave functions, abbreviated as i, w*q, are the scalar 
products (W, ®) of the state vectors of which they are the components. 

Observable quantities are represented in this formulation by linear operators 
that act on state vectors rather than on wave functions. Here an operator A being 
“linear” means that for any state vectors Yj and W2 and complex numbers ay 
and az, we have 


A(a,W) + ao.V¥2) = aj; AV, + aA. (5.10.4) 


32 This approach is the basis of Dirac’s 1930 treatise, The Principles of Quantum Mechanics, listed in the 


bibliography. 
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The adjoint of an operator A is defined as an operator A’ for which 
(©, ATW) = (A®, WV). (5.10.5) 


Real observables are represented by operators that are self-adjoint, in the sense 
that AT = A. 

The first interpretive postulate of quantum mechanics is that a state repre- 
sented by a non-zero state vector Y has a definite value a for an observable 
represented by an operator A if and only if V is an eigenvector of A with 
eigenvalue a — that is, 


AV=av. (5.10.6) 


If Y; and W2 are non-zero eigenvectors of a self-adjoint operator A with eigen- 
values a and a2 then 


a (2, Uy) = (W2, AW)) = (AW, V1) = a5 (Yo, YW). (5.10.7) 


Taking WY; = Wp and then of course a; = a, we see that eigenvalues of self- 
adjoint operators are real, while taking w; 4 a and then of course Yj 4 Wo, 
we see that eigenvectors of a self-adjoint operator with different eigenvalues are 
orthogonal, in the sense that (W2, ¥,) = 0. 

The second interpretive postulate of quantum mechanics is that in a state 
represented by a state vector W, the observable quantity represented by an 
operator A has the expectation value 


(WW, AW) 
(A)y = —~—~_. 
CY) 
Obviously it follows that if YW is normalized so that (V,WY) = 1, then the 
expectation value is (VY, AW). 


Suppose an observable is represented by an operator A with discrete eigen- 
values a, and eigenvectors ®,,, 


(5.10.8) 


A®y = ay, Py, (5.10.9) 
and we normalize these eigenvectors so that 
(On, Om) = dnm - (5.10.10) 


(If there is only one eigenvector for each eigenvalue it follows from Eq. (5.10.7) 
and the reality of eigenvalues that the different eigenvectors are orthogonal, 
and we can always multiply them by numerical factors so that they satisfy 
Eq. (5.10.10). Even in the case of degeneracy, with several eigenvectors for 
the same eigenvalue, we can always define linear combinations of these eigen- 
vectors to satisfy Eq. (5.10.10).) If we expand an arbitrary state vector V ina 
series of these eigenvectors VW = )°,, c,®,, by taking the scalar product with 
any of the ®,, and using Eq. (5.10.10) we find that c, = (®,, VW), so that 
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Y= J (Gp, V) Op. (5.10.11) 


Inserting this into Eq. (5.10.8) gives the expectation value of the observable 

represented by A: 

Dn An (Pn, WI? 
Yn ln, WP 


Since a corresponding result applies for any function of this observable, it fol- 
lows from Eq. (5.10.12) that the probability of finding a value a,, when we 
measure the observable represented by A is 


(A)w = (5.10.12) 


— [@n, Y? 
Pr) = —— 
Ca (®n, Y)| 


Note in particular that the sum of these probabilities is one. 

As in Section 5.3, we can pass over to the case of an operator A with a 
continuum of eigenvalues by supposing that it has a very large number of very 
close discrete eigenvalues. If there are \’(~) da eigenvalues between a and 
a + da then, in the limit of close packing, we can evaluate sums over n by 
replacing then with integrals over a: 


Des faa. (5.10.14) 


n 


(5.10.13) 


Making this replacement, and defining renormalized eigenvectors 
To = VN(@)®, for a=a, (5.10.15) 
Eqs. (5.10.11) and (5.10.14) become 


v= [eo Talo, Y) (5.10.16) 


and Eq. (5.10.12) gives 
_ fala, ¥)/? da 
fa, BP da 


We conclude that the probability that a measurement of the observable repre- 
sented by A will give a value in the range a toa + da is P(a) da, where P(a) 
is the probability density: 


(A)w 


(5.10.17) 


2 
pa) = ta) 


_ 5.10.18 
[\(Ta, WI? da : 


with the normalization of the state vectors Ty fixed by the condition (5.10.15). 
In particular, if in Eq. (5.10.16) we take WV = Yy’, we find 
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Tor = [a Toa (Ves Vo) 


so with this normalization the scalar product of these eigenvectors is the Dirac 
delta function discussed in Section 5.6, 


(Ty, Ty) = 8a — a’). (5.10.19) 


Of course, if we also normalize the state vector V so that 
/ (Ta, BP? da =1 


then the probability density is 
P(a) =|(Tu, WP . (5.10.20) 


It should by now be clear that the wave function w(x) (for instance, for a 
single particle in one dimension) is nothing but the scalar product 


W(x) = (Tx, Y) (5.10.21) 


where W is the state vector representing the physical state and Y, is a state 
vector, normalized to satisfy Eq. (5.10.15) or equivalently Eq. (5.10.19), 
representing a state in which the particle is at x. We can use suitably nor- 
malized eigenvectors of operators representing any other observables to define 
corresponding wave functions (Yy,¥), such as the momentum-space wave 
function introduced in Section 5.3. 

In general, eigenvalues and probabilities are to be calculated using relations 
among operators that represent physical observables, including commutation 
relations and formulas giving the operators that represent conserved quantities 
such as the Hamiltonian and angular momentum in terms of other operators. 
These relations embody the physical content of any particular quantum- 
mechanical theory. 


6 
Nuclear Physics 


Atoms were at the center of physicists’ interest in the 1920s. It was largely 
from the effort to understand atomic properties that modern quantum mechan- 
ics emerged in this decade. In this work physicists did not have to concern 
themselves much with the nature of the atomic nucleus. It had been known 
since Rutherford’s interpretation in 1911 of the scattering experiments in his 
laboratory that almost all the mass of atoms is contained in a tiny positively 
charged nucleus, but all that the atomic physicist needed to know about this 
nucleus was its electric charge, mass, and (to account for hyperfine splitting) its 
spin and magnetic moment. 

In the 1930s physicists’ concerns expanded to include the nature of atomic 
nuclei. The constituents of the nucleus were identified, and a start was made in 
learning what held them together. And, as everyone knows, world history was 
changed in subsequent decades by the military application of nuclear physics. 


6.1 Protons and Neutrons 


Discovery of the Proton 


The first known constituent of the atomic nucleus was the proton. In a series of 
experiments in 1919 on the passage of alpha particles from radioactive nuclei 
through various gases, Rutherford found that collisions of alpha particles with 
nitrogen atoms produced penetrating rays of particles whose range and deflec- 
tion by electric and magnetic fields seemed identical to what would be expected 
for hydrogen nuclei.! The reaction is now known to be '4N+4He — '70+!H, 
and is shown on a seven cent postage stamp of New Zealand, the country of 
Rutherford’s birth. Rutherford at first called these “H particles,’ and he specu- 
lated that they were constituents of all atomic nuclei. In the following year he 
gave them their modern name, protons. 


1 E. Rutherford, Phil. Mag. Series 6 37, 381 (1919); reproduced in Beyer, Foundations of Nuclear Physics, 
listed in the bibliography. 
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It was clear from the beginning that protons could not be the only constituents 
of atomic nuclei. This would have been close to a realization of a hypothesis in 
1815 of the chemist William Prout (1785-1860). Observing that known atomic 
weights were generally close to whole number multiples of the atomic weight 
of hydrogen, Prout proposed that all atoms are composites of hydrogen atoms. 
Applying Prout’s hypothesis to nuclei rather than to atoms would have done 
well in accounting for nuclear masses (which provide almost all of the masses 
of atoms). It would even work when applied to isotopes, sets of atoms that have 
an equal number of electrons and hence display the same chemical behavior 
but differ in their atomic weights. Measurements at the Cavendish Laboratory 
by Francis William Aston (1877-1945) had shown by 1919 that the atomic 
weights of various isotopes of hydrogen, carbon, oxygen, chlorine, etc. were 
all close to whole number multiples of the atomic weight of the lightest isotope 
of hydrogen. But to suppose that nuclei are made up only of protons would have 
entirely failed in dealing with nuclear electric charges. If nuclei were composed 
only of protons their atomic weights in units of the atomic weight of hydrogen 
would all be close to their atomic numbers, which as we saw in Section 3.4 
were by 1919 already known to equal their electric charges in units of the proton 
charge. But light nuclei such as helium, carbon, nitrogen, oxygen, etc. typically 
have atomic weights close to twice their atomic numbers. 


Electrons in the Nucleus? 


In his celebrated Bakerian lecture to the Royal Society of London in 1920,7 
Rutherford proposed that nuclei consist of two kinds of particle: protons and 
electrons. He was undecided about how these particles might be grouped within 
nuclei, though he tentatively proposed that nuclei consist of alpha particles 
(known to be *He nuclei), supposed to consist of four protons and two elec- 
trons, and nuclei of the isotope 3He, which Rutherford had discovered in the 
collisions of alpha particles with nuclei of nitrogen and oxygen, supposed to 
consist of three protons and an electron. In his lecture he also proposed the 
existence of neutral particles later called neutrons, with a mass similar to the 
proton’s, and with no electric charge. But for Rutherford the neutron was not a 
new particle — it was a composite of a proton and one strongly bound electron. 

The theory that nuclei consist of protons and electrons had some plausibility. 
Because electrons have so much less mass than protons, this theory implied 
that all atomic weights would be close to whole number multiples of the atomic 
weight of a single proton, the nucleus of hydrogen, as had been noticed by Prout, 
Also, some nuclei were known to emit electrons in beta radioactivity. But it was 
hard to see how this could work dynamically. In particular, if there are states 
of an electron and a proton that are much more deeply bound than a hydrogen 


2 E. Rutherford, Proc. Roy. Soc. A 97, 374 (1920). 
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atom, then why do the electrons in ordinary atoms including hydrogen atoms 
not all fall into these states, emitting the released energy as radiation? 

There was an even stronger argument coming from molecular physics against 
supposing nuclei to consist only of protons and electrons. As we saw in 
Section 5.5, we can tell whether the identical nuclei in a diatomic molecule 
are bosons or fermions from the ratio of intensities of transitions in the para 
and ortho states, which have orbital angular momentum ¢ respectively even and 
odd. At temperatures T for which the energies of these transitions are much less 
than kT, the total intensity of the para lines is greater than for the ortho lines by 
a factor (sj + 1)/s, if the spin s; of each nucleus is an integer and the nuclei are 
bosons, while the total intensity of the para lines is less than for the ortho lines 
by a factor s;/(s; + 1) if the spin s; of each nucleus is a half odd integer and the 
nuclei are fermions. In 1929 Walter Heitler (1904-1981) and Gerhard Herzberg 
(1904-1999) observed that the total intensity of the para lines in the diatomic 
nitrogen molecule is greater than the intensity of the ortho lines, indicating that 
the nucleus of the most common nitrogen isotope, 14N, is a boson.3 (In fact, 
we now know that it has spin 1.) But if nuclei consist of protons and electrons, 
then the '*N nucleus would consist of 14 protons to give atomic weight 14, and 
seven electrons, to give atomic number 14 — 7 = 7, adding up to 144+ 7 = 21 
fermions, and the '4N nucleus would be a fermion. 


Discovery of the Neutron 


This puzzle began to be resolved in 1932 with the discovery of the neutron* 
by James Chadwick (1891-1974), Rutherford’s second in command at the 
Cavendish Laboratory at Cambridge. Chadwick had learned about observations 
in Paris” that showed that collisions of energetic alpha particles with beryllium 
atoms produce highly penetrating electrically neutral rays, which when directed 
into a hydrogen-rich substance like paraffin produce protons that recoil with 
very high energy. Experiments at the Cavendish Laboratory showed that these 
neutral rays would also cause heavier nuclei to recoil, though with smaller 
recoil velocities, and from the ratios of the recoil velocities he was able to 
calculate the mass of the particles making up the neutral rays. It follows from 
Eq. (3.3.1) that if a particle B moving with velocity vg strikes a particle A at 
rest, and A recoils in the same direction as the initial direction of motion of B, 
then its recoil velocity will be 
2m RB 


/ 
v, = —————vUB 
A 

ma+mpB 


3 W. Heitler and G. Herzberg, Naturwiss. 17, 673 (1929). 

4 J. Chadwick, Proc. Roy. Soc. A 136, 692 (1932), reproduced in Beyer, Foundations of Nuclear Physics, 
listed in the bibliography. 

5 I. Curie and F. Joliot, Compt. Rend. Acad. Sci. Paris 194, 273 (1932). 
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Chadwick did not know the initial velocity vg, but he could eliminate it by 
taking the ratio of recoil velocities for different target nuclei of known atomic 
weights, and from this ratio he could calculate the atomic weight A, of the 
particle comprising the neutral ray. For instance, measurements showed that 
the same neutral ray from beryllium that causes hydrogen nuclei to recoil 
straight back with speed 3.3 x 107 m/sec would cause nitrogen nuclei to 
recoil straight back with speed 4.7 x 10° m/sec, so 


3.3107 — An/+An) — (14+ An) 
4.7x 10° An/(14+ An) (1+ An) 


from which it follows that A, ~ 1.16. Chadwick concluded that these neutral 
rays consist of particles he called neutrons, with mass close to that of hydrogen. 

Chadwick assumed that this was the neutron that Rutherford had anticipated 
in his 1920 Bakerian lecture, and he followed Rutherford in supposing that the 
neutron is a proton—electron bound state. He knew about the problem that study 
of the diatomic nitrogen molecule indicated that the '4N nucleus is a boson, 
which is not possible if it consists of 14 protons and seven electrons (whether or 
not combined into nuclei of *He or *He or proton-electron composites), but at 
first he decided to ignore the problem. This may have been due to a widespread 
reluctance at the time to contemplate any new fundamental particles besides 
the proton, electron, and photon, or perhaps it was just the influence of the 
formidable Lord Rutherford. The status of the neutron as a fermion that is every 
bit as elementary as the proton only became clear with studies of the forces 
between these particles, to be discussed in the next section. As a result of these 
studies, neutrons and protons became regarded as two members of a family of 
particles known as nucleons. 


Nuclear Radius and Binding Energy 


Like the states of electrons in atoms, the states of nucleons in all but the lightest 
nuclei can be described approximately by the Hartree approximation: each 
nucleon can be supposed to move in a potential due to all the other nucleons. 
Because nuclear forces have short range, each nucleon is chiefly affected by 
nucleons with the same one-nucleon orbital wave function. And, because 
nucleons are spin 1/2 fermions satisfying the Pauli exclusion principle, there 
are just three of these: for a proton (or neutron) state there is another proton (or 
neutron) state with opposite spin 3-component, and two neutron or proton states 
with each value for the spin 3-component. Thus, whatever the total number A 
of nucleons, as a first approximation the binding energy per nucleon and the 
volume per nucleon tend to be similar for all nuclei. This is known as the 
saturation of nuclear forces. 

With a constant volume per nucleon, the volume of a nucleus is proportional 
to the number A of nucleons, so the nuclear radius R is proportional to A!/?. 
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These radii can be calculated from measurements of the effect of the nuclear 
electric quadrupole moment on atomic spectra; from measurements of the scat- 
tering of electrons in the Coulomb field of the nucleus; and from the measured 
rates of alpha decays, to be discussed in Section 6.4. A consensus of these 
measurements gives a nuclear radius 


R213 10°" om x Al? (6.1.1) 


The binding energy of a nucleus is the energy required to take all of its nucleons 
to rest at a great distance. It can easily be calculated from measurements of 
atomic weights: it is the sum of the atomic weights of all the nucleons in the 
nucleus minus the atomic weight of the nucleus, times the mass energy mc? = 
931.494 MeV of unit atomic weight. 


Liquid Drop Model 


According to the idea of the saturation of nuclear force, the dominant term in 
the binding energy per nucleon is a constant, estimated to be about 15.8 MeV.° 
There are several corrections to this simple rule, which taken together provide 
the liquid drop model of the nucleus. 


Surface Tension 


With a nuclear radius proportional to A!/? the surface area of the nucleus is 
proportional to A/?, so a fraction proportional to A~!/> of the A nucleons 
is closer to the surface than the range of the nuclear force and therefore feels 
less attraction to other nucleons. This decreases the nuclear binding energy 
per nucleon by a term proportional to A~!/3, estimated from measured atomic 
weights as —18.3.A7!/3 MeV. 


Coulomb Repulsion 


The electrostatic repulsion of Z protons introduces a negative term in the total 
binding energy proportional to Z? and to the inverse nuclear radius, which is 
proportional to A~!/3. The Coulomb contribution to the binding energy per nu- 
cleon is therefore proportional to Z7.A~4/3. It is approximately —0.71 Z? A~4 
MeV. (The energy coefficient here is smaller than for the other terms in the 
binding energy because electric forces are intrinsically weaker than nuclear 
forces. For instance, the Coulomb energy of a uniformly charged sphere with 
charge Ze and radius (6.1.1) is 3Z7e?/5R = 0.66Z7.A~!/3 MeV.) 


6 The numerical values of coefficients of various terms in the nuclear binding energy are rounded off here 
from values derived from a fit to measured binding energies by A. H. Wapstra and N. B. Gove, Nuclear 
Data Tables 9, 267 (1971). 
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Neutron—Proton Inequality 


The Pauli exclusion principle leads to a decrease in the binding energy for nuclei 
with unequal numbers of protons and neutrons. Given a nucleus with equal 
numbers of protons and neutrons, if we imagine a proton changed into a neutron 
the new neutron would be forced by the exclusion principle to occupy a state 
of energy higher than any of the originally occupied neutron states. Because 
of the symmetry between protons and neutrons (discussed in the next section), 
with equal numbers of protons and neutrons the highest energy of the originally 
occupied neutron states equals the highest energy of the originally occupied 
proton states, so changing this proton into a neutron necessarily increases its 
energy. The same is true if we change a neutron into a proton. This decrease in 
the total binding energy is approximately proportional to (NV — Z)?/A, where 
N = A-— Z is the number of neutrons. It is taken as proportional to 1/A to take 
account of the decrease in the spacing of nuclear energy levels with increasing 
A. Observed binding energies indicate a term in the binding energy per nucleon 
of —23.2 MeV x (A — 2Z)?/A’. 
Putting this together, the binding energy per nucleon goes as follows; 


binding energy/A ~ 15.8 — 18.3 A~!/3 — 0.71 Z7 A473 
— 23.2(A—2Z)*A~? MeV. (6.1.2) 


There are also sporadic bumps in the binding energy. Nuclei with even or odd 
numbers both of protons and of neutrons have an additional term in the binding 
energy that is about 12/./A MeV or —12/A MeV, respectively. Also, the 
binding energy is increased for certain “magic” numbers of protons or neutrons, 
to be discussed in Section 6.3. 


Stable Valley and Decay Modes 


For a given value of A, the most deeply bound nucleus has a value of Z given 
by the stationary point of the binding energy per nucleon (6.1.2): 


A 
Lar 
2 + 0.015A2/3 


Nuclei with smaller or larger values of Z for a given A tend to decay into the 
nucleus whose Z is given approximately by Eq. (6.1.3), with the emission of an 
electron or its antiparticle, the process known as beta decay, to be discussed in 
Section 6.5. In a contour map of nuclear masses plotted against A and Z, the 
nuclei satisfying Eq. (6.1.3) form a valley of relatively high binding energy and 
hence low mass, known as the stable valley. 

For A < 50 Eq. (6.1.3) gives Z close to A/2, as was noticed with the earliest 
measurements of the atomic numbers of nuclei such as “He, en AN, 160, 
etc. As we consider nuclei with increasing values of A the Coulomb repulsion 


(6.1.3) 
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among the protons becomes more and more important, and the nuclei with 
the lowest ground state energy tend to have an increasing ratio of neutrons to 
protons. For instance, for A = 56 the nucleus with the lowest ground state 
energy is >°Fe, with 26 protons and 30 neutrons. The atomic numbers of the 
stable valley fall increasingly below the line Z = A/2 for larger values of A, to 
a value Z = 92 for A = 238. 

In the stable valley, Eqs. (6.1.2) and (6.1.3) give a binding energy per nucleon 
that increases with increasing A for lighter nuclei, owing to the decreasing effect 
of surface tension, reaches a maximum of about 9 MeV for iron and nickel, and 
then, because of the Coulomb term, decreases slowly for larger A, taking a 
value of about 7.5 MeV for 778U. The decrease with A of the binding energy 
per nucleon for heavy nuclei makes it energetically favorable for these nuclei to 
decay by splitting into fragments, either by spontaneous fission into two nuclei 
of much lower .A, or more often by emitting an alpha particle. After emitting 
one or a few alpha particles a nucleus becomes excessively neutron-rich for the 
new, lower, value of A, and it becomes energetically favorable for the nucleus to 
lower the neutron—proton ratio by one or more beta decays, moving back toward 
the stable valley. These alpha and beta decay processes sometimes yield nuclei 
in excited states, which then undergo gamma decay to the ground state, emitting 
an energetic photon. A succession of alpha, beta, and gamma decays continues 
until the nucleus transforms into a non-radioactive nucleus, such as one of the 
stable isotopes of lead. 

For instance, in the decay chain that is most important in the history of 
physics, uranium 238 alpha-decays to thorium 234 with a half life of 4.47 x 10° 
years, and then, with much shorter half lives, thorium 234 beta-decays to 
protactinium 234, which beta-decays to uranium 234, which alpha-decays 
to thorium 230, which alpha-decays to radium 226, which alpha-decays to 
radon 222 (an example of alpha decay considered in detail in Section 6.4), 
which alpha-decays to polonium 218, which alpha-decays to lead 214, which 
beta-decays to bismuth 214, which beta-decays to polonium 214, which alpha- 
decays to lead 210, which beta-decays to bismuth 210, which beta-decays to 
polonium 210, which alpha-decays to the stable isotope lead 206, which makes 
up 24% of natural lead. 


6.2 Isotopic Spin Symmetry 


There is a deep symmetry between protons and neutrons, which made it evident 
that neutrons are fermions and just as elementary as protons. Knowledge of this 
symmetry emerged in the late 1930s from a study of the forces among protons 
and neutrons. 
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Nuclear Forces 


The first of the nuclear forces to be studied was that between a proton and 
a neutron, which could be measured by observing the scattering of neutrons 
on the protons in a hydrogen-rich substance such as paraffin. As in all scat- 
tering processes, the scattering amplitude f(x) introduced in Section 5.6 may 
be expanded as a sum over terms with angular dependence proportional to the 
spherical harmonic functions Y;” (x) defined in Section 5.2. The terms with 
£ > O are suppressed at low energy by a centrifugal barrier, which makes 
the wave function vanish for vanishing separation r as r’, so at the energies 
available in the 1930s the scattering was dominated by the term with ¢ = 0, for 
which the scattering amplitude f is independent of direction. But it is important 
here to keep track of the dependence of the scattering amplitude on spin, which 
we ignored in Section 5.6. With the neutron taken like the proton to have spin 
1/2, there are now two terms in the amplitude for neutron—proton scattering, 
with total spin s = 0 or s = 1. In the absence of orbital angular momentum 
the total spin is conserved in the scattering process, so the total scattering cross 
section takes the form og + 01, where o, is the cross section in the £ = 0 
proton-neutron state with total spin s. It is possible to separate the contributions 
of spin zero and spin one by using data on the deuteron, a proton—neutron bound 
state with € = O (and a small admixture of € = 2) and with total angular 
momentum j = | and hence total spin s = 1. There is a classic relation’ that to 
a good approximation gives 0; = 2h*/wB, where pu is the reduced mass of a 
proton and a neutron and B is the deuteron binding energy, so using scattering 
data and the deuteron binding energy one can separately find oo and oj. 

This is important because protons and neutrons are fermions, so the 2 = 0 
state of two protons or two neutrons must be antisymmetric in the particles’ 
spin 3-components. As can be seen from either Eq. (5.4.42) or Table 5.1, this 
requires the state to have total spin zero. It is therefore of interest to compare the 
value of o9 deduced for proton—neutron scattering for s = 0 with the observed 
total low-energy proton—proton scattering cross section. 

Unfortunately there is no way to make a target out of the electrically neutral 
(and, as we shall see, unstable) neutron, so it was not possible to make a direct 
measurement of neutron—neutron scattering. There is no similar obstacle to the 
measurement of proton—proton scattering for, as in Rutherford’s 1919 experi- 
ments, one can make a target of a hydrogen gas or a proton-rich substance like 
paraffin. Here the problem is that at low energy the scattering is almost entirely 
due to the Coulomb potential, and reveals nothing about the nuclear forces. 


7 For a textbook derivation, see Section 8.8 of Weinberg, Lectures on Quantum Mechanics, listed in the 
bibliography. 
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The measurements in Rutherford’s laboratory of the scattering of alpha particles 
by various nuclei had indicated that the range of nuclear forces is no larger than 
about R ~ 10~!3 cm. In order for two protons to approach to a distance less than 
this, it is necessary for their kinetic energy to be greater than e*/R ~ 1.4 MeV. 
High-energy proton beams became available in the 1930s with the invention of 
accelerators with potential differences produced electrostatically, which were 
used® to make accurate measurements of proton—proton scattering. It turned out 
that when the scattering amplitude due to Coulomb forces was subtracted, the 
£ = 0 part of the purely nuclear proton—proton scattering amplitude was equal 
to the previously measured £ = O proton—neutron scattering amplitude in the 
state with total spin zero. 


Isotopic Spin Rotations 


The equality of forces soon led two pairs of theorists’ to propose that the laws 
governing nuclear forces (whatever they are) respect a symmetry among neu- 
trons and protons. It is not just that these laws do not change if everywhere in 
the equations we change neutrons into protons and protons into neutrons. That 
would imply that the proton—proton nuclear force is the same as the neutron— 
neutron force but would say nothing about their relation to the proton—neutron 
force. Rather, according to the proposed symmetry principle, the laws governing 
nuclear (but not electromagnetic) forces are invariant under what is called an 
isotopic spin rotation, which acts not on momenta or ordinary spin but on the 
labels of the nuclear particles. The neutron and proton are supposed to form a 
doublet, called the nucleon: 
(") 
n 


on which isotopic spin rotations act in the same way mathematically that ordi- 
nary rotations act on the two ordinary spin states of any particle with s = 1/2. 
(Specifically, isotopic spin rotations act on the nucleon doublet as a 2 x 2 matrix 
U having the property U' = U~', known as unitarity, and having determi- 
nant unity. But we won’t need to use this information here.) Just as we saw 
in Section 5.4 that the effect of infinitesimal ordinary rotations on physical 
states is given by an angular momentum operator J, whose components sat- 
isfy the commutation relations [Jj, Jj] = ihe; jx Jx (where €;jx is the totally 
antisymmetric quantity with €;23 = +1, and repeated indices are summed), 
in the same way infinitesimal isotopic spin rotations are generated by a three- 
component operator T, whose components satisfy the commutation relations 


8 M. A. Tuve, N. Heydenberg, and L. Hafstad, Phys. Rev. 50, 850 (1936). 
9 B. Cassen and E. U. Condon, Phys. Rev. 50, 846 (1936); G. Breit and E. Feenberg, Phys. Rev. 50, 850 
(1936). 
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[Ta, Th] = i€abcTc. (The a, b, c indices can be taken like 7, j, k to run over the 
values 1, 2, 3, but of course for the isotopic spin these values have nothing to 
do with directions in ordinary space. Repeated indices are again summed, and 
€abc like €;;x is a totally antisymmetric quantity with €;23 = 1.) The proton and 
neutron are taken as the states with 73 = +1/2 and 73 = —1/2, respectively. 
Just as two particles with ordinary spin 1/2 can combine to form a compound 
state with total spin s equal to 0 or 1, two nucleons can combine to form a com- 
pound state with total isotopic spin 0 or 1, which transforms under isotopic spin 
rotations in the same way that states with ordinary total spin O or 1 transform 
under ordinary rotations. The invariance of nuclear forces under isotopic spin 
rotations tells us that total isotopic spin is conserved, so the cross section for the 
scattering of two nucleons is the sum of a cross section for isotopic spin 1 and 
across section for isotopic spin zero. The states with total isotopic spin 1 form 
a triplet, just like orbital angular momentum states with £= 1, whose compo- 


nents are a proton+proton state with 73 = +1, a proton+neutron state with 
T3 = 0, and a neutron+neutron state with 73 = —1. The proton+proton and 
neutron+neutron € = 0 states must be antisymmetric in the nucleon spin 


3-components, and therefore have total ordinary spin 0. Since spin and isotopic 
spin commute, the proton+neutron component of this triplet must then also 
have spin zero. Since these three s-wave nucleon-nucleon states with total 
ordinary spin 0 form a triplet, the scattering cross sections are the same for each. 

On the other hand, an s-wave state of two nucleons with ordinary spin 1 
is symmetric in the spin 3-components, so it cannot be a proton+proton or 
neutron+neutron state, and can therefore only be a proton+neutron 73 = 0 state 
of a singlet with total isotopic spin zero. This is the deuteron, with total angular 
momentum and total ordinary spin both equal to one. 


Multiplets 


The implications of isotopic spin symmetry go far beyond the equality of s- 
wave nucleon-nucleon cross sections for total spin zero. Before we go into this, 
it is necessary to say something about the relation of isotopic spin quantum 
numbers and electric charge. For the proton—neutron doublet, it is obvious that 
the electric charge of a nucleon is 


Q =e[T3 + 1/2] (6.2.1) 


so that protons and neutrons will have charges respectively e and 0. In a nucleus 
with B nucleons, the charge is the sum of (6.2.1) for all the nucleons, so 


Q=e[T3+ B/2), (6.2.2) 


where now T is the isotopic spin operator of the whole nucleus and B is the 
number of nucleons. As we have seen, B is very close to the atomic weight A 
of the element, but we use the symbol B instead of A because they are not 
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precisely equal, and in order that Eq. (6.2.2) should apply for some of the 
particles discovered after World War II that are not composed of protons and 
neutrons. In this more general context, B is known as the baryon number. (For 
some unstable particles a quantity S known as strangeness that is conserved in 
strong and electromagnetic interactions must be added to B in Eq. (6.2.2).) 

Of course, electromagnetism does not respect isotopic spin symmetry: pro- 
tons are charged while neutrons are not. Equation (6.2.2) shows that in elec- 
tromagnetic phenomena involving the charge operator the 3-component of the 
isotopic spin operator plays a different role from the 1- and 2- components. 
There is also a nucleon mass difference, m, — mp = 1.293 MeV /c?, which 
contributes a term in the total rest mass proportional to 73. For relatively light 
nuclei, with atomic numbers less than about 20 to 30, Coulomb forces are 
less important than nuclear forces and isotopic spin symmetry is fairly well 
respected, but this is not true for heavy nuclei, where the Coulomb repulsion 
of protons in the nucleus comes close to tearing the nucleus apart. It makes no 
sense to talk about isotopic spin symmetry when we are dealing with uranium. 

Relatively light nuclei must form isotopic spin multiplets. We characterize 
any multiplet by a total isotopic spin quantum number f, defined so that (just as 
for ordinary spin multiplets) the multiplet consists of 2¢ + 1 nuclei with 73 
equal tot, t — 1, ...,—t, all with the same ordinary spin (that is, total angular 
momentum) and with close to the same energy. Acting on the multiplet the 
isotopic spin operator T satisfies T* = ¢(t + 1), the proton and neutron form a 
t = 1/2 doublet, and the deuteron is at = O singlet. There are many ¢ = 1/2 
doublets of complex nuclei; the lightest consists of the light isotope *He of 
helium, whose discovery was announced by Rutherford in his 1920 Bakerian 
lecture, and tritium, the radioactive isotope 3H of hydrogen discovered at the 
Cavendish Laboratory!® in 1934. The *He nucleus consists of two protons and 
one neutron and has atomic weight 3.01605, while the 3H nucleus is composed 
of one proton and two neutrons and has atomic weight 3.01603. Both nuclei 
have spin 1/2. 

There are also triplets of nuclear states with tf = 1, which show again that 
this is a symmetry under transformations that go beyond the mere interchange 
of protons and neutrons. A famous example includes the ground states of the 
nuclei of !2B and !2N, which have B = 12 and charges 5e and 7e, and hence 
according to Eq. (6.2.2) have 73 = —1 and 73 = +1. The 73 = O member 
of the triplet would then be ordinary carbon, '*C, with nuclear charge 6e. But 
it is not the ground state of 1*C, which has total angular momentum j = 0, 
while the ground states of !*B and !?N both have j = 1. Also, although the !*B 
and !?N ground states have nearly equal atomic weights, 12.0144 and 12.0186, 
respectively, the !*C ground state by definition has atomic weight 12.0000. (The 
greater binding energy of !?C is due to two effects mentioned in the previous 


10 M. Oliphant, E. Harteck, and E. Rutherford, Nature 133, 413 (1934); Proc. Roy. Soc. A 144, 692 (1934). 
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section: the numbers of protons and neutrons in !7C are equal, and both numbers 
are even.) The small difference in atomic weights of !*B and !7N is due to the 
greater Coulomb repulsion among the seven protons of !7N than among the five 
protons of !?B, but this cannot account for the large difference from the atomic 
weight of the ground state of carbon. In order to provide the 73 = 0 member 
of a triplet with !*B and !7N, there would have to be a spin | state of !*C 
with an excitation energy well above the ground state. Since the number of 
protons in '7C is the average of the numbers in !*B and !7N, we would expect 
its excitation energy to be about 0.0165 mc? (the average of 0.0144 mc? and 
0.0186 m c?), or, taking mjc? = 931.5 MeV, about 15.3 MeV. In fact there is 
such a state, a spin 1 state of !7C that is 15.11 MeV above the !7C ground state, 
which decays into the ground state by emission of a photon. This is the 73 = 0 
member of the triplet. 


Why Isotopic Spin Symmetry? 


One may wonder why nuclear forces should obey a symmetry principle that 
is not obeyed by other forces, such as those of electromagnetism. Indeed, one 
should wonder. An invariance principle that applies only to some phenomena 
and not others can hardly be regarded as a fundamental physical principle. 
This puzzle became resolved in the modern theory of strong nuclear forces 
known as quantum chromodynamics."! Briefly, in this theory the neutron and 
proton are composed of two kinds of elementary spin 1/2 particles, the up quark 
with charge 2e/3 and the down quark with charge —e/3. In close analogy 
with how *He and 3H are composed of protons and neutrons, the proton is 
composed of two up quarks and a down quark, while the neutron consists of 
one up quark and two down quarks. Nuclear forces in quantum chromodynam- 
ics are carried by eight fields like the electromagnetic field, only interacting 
with a quantum number known whimsically as color instead of charge. At the 
energies characteristic of nuclear phenomena these forces are much stronger 
than electromagnetic forces, which is why the composite nature of protons and 
neutrons is not apparent in most nuclear phenomena and why electromagnetism 
can be treated as a small perturbation in studying light nuclei. The quarks all 
carry the same set of colors, so strong nuclear forces do not distinguish up 
from down quarks, but isotopic spin symmetry is not imposed on the theory. In 
fact, unlike protons and neutrons, the up and down quarks have quite different 
masses: according to one estimate, the down quark mass is almost twice the 


a Quantum chromodynamics is part of our present theory of elementary particles and their interactions, 
the Standard Model. Formulating and testing this model has been the work of many physicists. For an 
informal history see Weinberg, “Half a Century of the Standard Model,” listed in the bibliography. A more 
detailed account with references to much of this work can be found in Weinberg, The Quantum Theory of 
Fields, Vol. II: Modern Applications (Cambridge University Press, Cambridge, UK, 1996). 
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up quark mass. The reason for the isotopic spin symmetry of strong forces 
is just that there is no room in the theory for any violation of the symmetry 
other than the quark masses, and the quark masses although unequal are very 
small. Almost all of the masses of the proton and neutron comes from the strong 
nuclear forces acting among the quarks within a single proton or neutron, not 
from the quark masses. 

The small mass difference between the proton and the neutron comes both 
from differences in the quark masses and from electromagnetic forces among 
the quarks, but the quark mass difference is somewhat more important. This why 
the neutron is heavier than the proton, even though the electric charges of the 
quarks in the proton are larger than those in the neutron. It is both the smallness 
of the quark masses and the relative weakness of electromagnetic effects that 
makes the neutron-proton mass difference, 1.293 MeV/c”, so tiny compared 
with the proton mass, 938 MeV/c’. 


Pions 


Isotopic spin symmetry had important implications for the new strongly inter- 
acting particles discovered after World War II. The first of these particles was 
the pi meson, or pion as it is frequently called. In 1947 a group at the University 
of Bristol,!? studying photographic plates that had been exposed to cosmic 
rays at high altitudes in the Pyrenees and Andes, found evidence of a strongly 
interacting particle with a mass intermediate (hence the name “meson’’) between 
the electron and the nucleon. It is today known that these charged pions come 
with charges +e and —e, both with masses 139.570 MeV/c. These particles 
are produced singly in reactions such as p + p > p+n-+27*, and so if 
baryon number is conserved these particles must be supposed to have B = 0. 
Equation (6.2.2) then indicates that the 2* and 2~ have 73 = +1 and 73 = —1, 
respectively. No doubly charged particles with similar mass have ever been 
found, so the pions cannot be part of an isotopic spin multiplet with ¢ > 2, 
and therefore must be part of a triplet, with t = 1. The neutral 73 = 0 member 
of the triplet, the 2°, was discovered at the Berkeley cyclotron in 1950 — the 
first particle to be found at an accelerator before it being discovered in cosmic 
rays. The mass of the 9 is now known to be 134.977 MeV/c”. 

In quantum chromodynamics, the z+ and z~ are respectively u + d and 
d + u, where u and d stand for up and down quarks, and the bar denotes 
antiquarks. The z° is a 50-5SO superposition of u + a and d + d. The quark 
masses contribute equally to all three pions, so the ~4.6 MeV/c? mass 
difference between charged and neutral pions is entirely due to electromagnetic 
forces. In fact, this is the one mass difference in an isotopic spin multiplet 


12 C. M. G. Lattes, H. Muirhead, G. P. S. Ochiallini, and C. F. Powell, Nature 159, 694 (1947). 
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of elementary particles that has been successfully calculated as a purely 
electromagnetic effect. 

Although the charged and neutral pions are joined in an isotopic spin triplet, 
their decays occur through interactions that do not respect isotopic spin symme- 
try and hence they have very different decay rates and decay modes. The neutral 
pion decays into two photons through purely electromagnetic interactions, with 
mean lifetime (8.52 + 0.18) x 107!” seconds. The charged pion decays much 
more slowly through the weak interactions discussed in Section 6.5, with mean 
lifetime (2.6033+0.0005) x 10~8 seconds, primarily into a neutrino and a muon, 
a particle similar to an electron but 210 times heavier, discovered in cosmic rays 
in 1937. 


Appendix: The Three—Three Resonance 


There are no clearly identified multiplets of nuclear states larger than triplets, 
but there is a conspicuous quartet of unstable particles that decay into a nucleon 
and a pion, with masses all close to 1210 MeV/c. This is the “three—three 
resonance” A, where “three—three” means that it has t = 3/2 and j = 3/2, 
and “resonance” indicates that these are seen as sharp peaks in pion—nucleon 
scattering, interpreted as the formation of an unstable intermediate state that 
decays back into a nucleon and a pion. As discussed at the end of the appendix 
to Section 5.6, the total decay rate of each of these four states is measured as 
the width of the peak of the cross section as a function of energy, divided by h; 
the rate of decay into any particular pion—nucleon state equals the total decay 
rate times the branching ratio, the fraction of scattering events at the resonant 
energy that produce that pion—nucleon state. 

Since the formation and decay of the A both indicate that it has the same 
baryon number B = 1 as the nucleon, Eq. (6.2.2) indicates that the four states of 
the quartet with charges 2e, e, 0, and —e have 73 = 3/2, 73 = 1/2, 73 = —1/2, 
and 73 = —3/2. Like the proton and neutron the A states are interpreted as 
composites of three quarks: respectively uuu, uud, udd, and ddd. 

The three—-three resonance provides a good example of the power of sym- 
metry principles such as isotopic spin symmetry to do more than dictate how 
energy eigenstates are grouped into multiplets. The conservation of isotopic 
spin tells us that the nucleon and pion produced when a A decays must be in a 
state of total isotopic spin 3/2 rather than a mixture of isotopic spins 3/2 and 
1/2. For a three-three resonance A with a given value of 73, the nucleon—pion 
state has wave function 


> C1128 /2, T3: Ts F 1/2, +1/2) 05% 9.21) » 
a 


where wa [21/2 is the wave function for a pion and a nucleon with their 
third components of isotopic spin equal respectively to T3 + 1/2 and +1/2, and 
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C1,1/2(3/2, T3; t, t’) is the Clebsch—Gordan coefficient discussed in Section 5.4. 
The rates of decay of a three—three resonance with a given 73 into various 
pion—nucleon states are then given by 


T(A(T3) > 2(T3 ¥ 1/2) + N(+1/2)) 
= Tq |C11/2/2, Ts; Ts = 1/2,41/2)|° , 


where I’q is the total decay rate of a three—three particle of any charge; it 
is another consequence of isotopic spin symmetry that these total decay rates 
are the same for all four charges of the A. Looking up the Clebsch—Gordan 
coefficients in Table 5.1 for combining states of spin 1 and 1/2 to form a state 
of spin 3/2, we see that for the T; = 1/2 state A* we have 


T(At > at4+n)=Ta/3, (Ato 294 p) =2Fa/3, 
while, for the 73 = —1/2 state A°, 
T(A® > 27> + p) =Ta/3, (A> x +n) = 27 4/3. 


For the Att and A~ there is only one available decay channel, so without 
looking up Clebsch—Gordan coefficients we know that 


T(Att > at 4+ p)=T(A7> oa 4+ny=Ta. 


These predictions were verified in experiments on pion—nucleon scattering car- 
ried out by Fermi’s group at Chicago in the early 1950s. 


6.3 Shell Structure 


In nuclei as in atoms it is a fair approximation to adopt a Hartree approximation, 
in which each nucleon feels an effective potential due to all the other nucleons. 
Neutrons and protons are fermions, so their states in nuclei are governed by 
the Pauli exclusion principle, like the states of electrons in atoms. In particular, 
there are nuclei in which protons or neutrons or both form closed shells like the 
electrons in noble gases, and therefore are more tightly bound than other nuclei 
of similar weight. 

The great difference between the closed shells in atoms and nuclei arises 
from the difference in the form of their effective potentials. Both potentials have 
approximate spherical symmetry, but in nuclei, unlike atoms, there is nothing 
special at the center of symmetry that would make the nuclear potential singular 
there. Since the nuclear potential is a function only of the radial coordinate r, 
and is expected to be analytic in the Cartesian components of the coordinate 
vector x, it must be a power series in r? = x”. Within some neighborhood of 
the origin, it is therefore approximately linear in x’, a relation we shall write as 
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V(x) > Vot+ 7 N@ x (6.3.1) 


where my can be taken as the mean nucleon mass, and w is a constant with the 
dimensions of frequency. The total Hamiltonian is then a sum of one-nucleon 

Hamiltonians, each of the form 
Pp? m vor 
H=Vj+——+ 

2mn 2 
with X the operator that multiplies the wave function with the coordinate 
argument x, and P the operator that acts on the wave function as the differential 
operator —ifV. This is the Hamiltonian for an harmonic oscillator with circular 
frequency w, the first problem solved using Heisenberg’s matrix mechanics at 
the beginning of quantum mechanics. !3 

To find the spectrum of eigenvalues of this Hamiltonian, we introduce a 


vector operator 
1 MN@ 
a= ——P - i /—_xX. 6.3.3 
J/2myoh Qh ( ) 


Recalling the commutation relations (5.3.22), 


[X;, Pj] = ihdj; , [X%i,Xj]= (PF, Pj] =9, (6.3.4) 


x, (6.3.2) 


it is straightforward to calculate that 
[ai.a)] =6;, [a,aj]= [a; a7] = (0. (6.3.5) 
The Hamiltonian (6.3.1) can be expressed as 
H= Vet ?fa-at tat al] ; 
Using the commutators (6.3.5), this is 
H=Vot 3" thoat -a. (6.3.6) 


The operators a’ and a play the role of raising and lowering operators for 
the energy. Using Eq. (6.3.6) and the commutation relations (6.3.5), we easily 
see that 


[a,H]=fhwa, [a',H] =—hoa'. (6.3.7) 
It follows that if Hw = Ew, then 
H(a'y) = (E +ha)(a'wy) , (6.3.8) 


13 w. Heisenberg, Zeit. Phys. 33, 879 (1925). This article is reprinted in English in Van der Waerden, Sources 
of Quantum Mechanics, listed in the bibliography. 


226 6 Nuclear Physics 


H(aw) = (E —ho)(ay) . (6.3.9) 


We assume that there is a one-nucleon state with some minimum energy Eo. In 
this case the wave function wo for this state must satisfy 


ayo = 0, (6.3.10) 


since otherwise according to Eq. (6.3.9) the wave function awo would be an 
energy eigenfunction with an even smaller energy, Eg — iw. Using avo = 0, 
Eq. (6.3.6) then gives 


Hy = Env. (6.3.11) 
where Epo is the minimum energy: 
3ha 


The energy fiw/2 associated with each of the three coordinate components 
is known as the zero-point energy. The appearance of a zero-point energy for 
harmonic oscillators is an inevitable feature of quantum mechanics. Inspection 
of the Hamiltonian (6.3.2) shows that for a state to have energy as low as Vo 
its wave function would have to be an eigenfunction of both P and X with 
eigenvalues zero, which is impossible since the commutator [X;, P;)] = ih 
cannot vanish acting on any wave function. 

Equation (6.3.8) shows that acting on any wave function with any component 
a, ‘raises the energy of the state by iw, so we can find a complete set of energy 
eigenfunctions 


Wninon3 = (af)™ (ah)"(al)™ Wo (6.3.12) 
for which 
AWninon3 = (Eo aa nho)Wnynon3 ’ (6.3.13) 


where n = nj +712 +03. 

We could just as well construct an eigenfunction of H with eigenvalue 
Eo + nhq if in place of (aly (ahr (al)"3 we operated on Wo with any 
homogeneous polynomial of order n in the components of a‘ — that is, any 
sum of terms, each proportional to a product like that in Eq. (6.3.12) of a 
total of n factors of components of a‘. In order to make clear the angular 
momentum content of these states, it is much more convenient to use the set of 
homogeneous polynomials encountered in Eq. (5.2.16). Expressed as a function 
of any vector v, these are 


yin(v) = |viOvi" (0) , (6.3.14) 
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where Y;” is the spherical harmonic function described in Section 5.2, with ¢ a 
non-negative integer and m an integer running over the 2€+ 1 values from —£ to 
+2. For instance, y (v) is a constant, and 


3 3 
Yew) = Fy go @ +iv2), Yew) = ee U3. 


We can find a complete set of states with energy Eo +-nho and angular momen- 
tum quantum number @ for which n — £ is an even non-negative integer: 


wm, = (al al) 9? yr alo. (6.3.15) 


For instance, form = 0 we have only € = 0, and Wo 9 18 proportional to the 
minimum-energy wave function wo. For n = 1 we have only @ = 1, and Whi = 
Yi" (a') Wo. For n = 2 we have both € = 2, with y3", = Y3"(a") Yo and also 
 =0, with WY x (a? -at) yo. 

All but the lowest energy states are evidently degenerate. As we have seen, for 
energy levels with n = 1 andn = 2 there are respectively three and 5+ 1 = six 
states with energies respectively Ey + fiw and Eo + 2fa. In general, the number 
#, Of states with energy Eo + nfiw is the sum of 2¢ + 1 for all non-negative 
integers € with n — £ an even non-negative integer 2v. That is, 


yn — 4v + 1) = 2nt Din/2+ PD — 2n/2)(/2 + D 


for n even 

Hn =) oP? an — 4v + 1) = Qn + In — 1/241) 
—2((@ — 1)/2)(@ — 1)/2 + 1) 
for n odd 


and so, whether n is even or odd, the degeneracy (apart from spin) is 
#n = (n+ 1)(n4+2)/2. (6.3.16) 


This can be recognized as the number of ways an integer n can be written as 
a sum of three non-negative integers, so this is also the number of independent 
wave functions Wy,n.n, with n = n; + n2 + n3 defined by Eq. (6.3.12). Thus 
the wave functions (6.3.15) form a complete set of eigenfunctions of H with 
eigenvalue Ey + nho. 

It has been possible to work out the energy eigenvalues and their degeneracies 
here (as Heisenberg did in 1925) without examining the form of these wave 
functions as functions of the nucleon coordinates, but it will help to make our 
discussion more concrete if we take a moment to look at these wave functions. 
By using Eq. (6.3.3), the defining (6.3.10) for the wave function of the state of 
minimum energy can be written explicitly in a first-order differential equation 
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Bg ge, (NO = 63.417 
me + aA Wo(x) =0. (6.3.17) 


The solution (with arbitrary normalization) is 


Myn@ 
2h 
The wave functions w/",(x) can be found using Eq. (6.3.15), with wo(x) given 


Wo(x) = exp “| ; (6.3.18) 


by Eg. (6.3.18), and with a‘ replaced with the differential operator 


. [| MNw 


—i 
= V+ j 
J/2mnoh 2h 


r 


a xX. 


For instance, 


Vit (x) x |x| ¥j"(*) exp |e . 
Taking into account the two spin states of a nucleon, the actual degeneracy of 
the energy level E = Eo + nha is twice the quantity (6.3.16), or 


(n+ 1)(n +2) =2, 6, 12, 20, 30, 42, ... 


This leads to the expectation that the protons or neutrons in a nucleus would all 
form closed shells if the number of protons or of neutrons were equal to 


2,2+6=8, 84+12=20, etc. 


These are the so-called magic numbers of nuclear physics,'+ analogous to the 


atomic numbers 2, 10, 18, etc., of the noble gases in atomic physics. We expect 
nuclei with a magic number of protons or neutrons to be more deeply bound 
and hence more abundant than other nuclei with similar numbers of neutrons 
and protons. A nucleus is likely to be particularly deeply bound if it is doubly 
magic, with a magic number of both protons and neutrons. Indeed, the lightest 
doubly magic nuclei are “He, !°O, and 4°Ca, which are more tightly bound and 
abundant than other nuclei of similar weight. 

One might expect the magic number following 20 to be 20+ 20 = 40, but this 
is not the case. The degenerate multiplets we found for the harmonic oscillator 
begin, for heavier nuclei, to be split in energy, both by the interaction of the 
spin and orbital angular momenta of the nucleons and from the breakdown of 
the harmonic oscillator approximation (6.3.1) as nucleons in high energy levels 
spend increasing time away from the nuclear center. In particular, there is a 
term in the Hamiltonian for each nucleon proportional to S - L with a large 


14 Mu. Goeppert-Mayer and J. H. D. Jensen, Elementary Theory of Nuclear Shell Structure (Wiley, New York, 
1955). 
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negative coefficient, which for each n lowers the energy of the single-nucleon 
state with the largest orbital angular momentum ¢ = n and largest total angular 
momentum, j = €+ 1/2 =n+ 1/2, below the energies of other single-nucleon 
states with the same n. 

Without these corrections, the n = 3 energy level would have 20 degenerate 
states with £ = 3 and ¢ = 1, but these corrections lower the eight 7/2 states 
below the other 12 states, so the magic number following 20 is not 40, but 
20 + 8 = 28. The element with 28 protons is nickel, which is known to be pro- 
duced abundantly by nuclear reactions occurring in core-collapse supernovae. 
The most abundant isotope of nickel is not the doubly magic >°Ni; this iso- 
tope is less abundant than either °*Ni or ©°Ni, which have a magic number 
only of protons. This is because the negative nuclear potential energy of the 
additional neutrons is needed to compensate for the Coulomb repulsion of the 
28 protons. Even so, as noted in Section 3.4, the deep binding of nickel isotopes 
makes nickel an exception to the rule that atomic weight steadily increases with 
atomic number. 

The same pattern repeats for larger nucleon numbers. The next shell has 
nucleons in the 20 — 8 = 12 states with n = 3 and j < 7/2, and in the 10 
n = 4 states with € = 4 and j = €4 1/2 = 9/2, giving a magic number 
28 + 12 + 10 = 50. The next shell has nucleons in the 30 — 10 = 20 states 
with n = 4 and j < 9/2, and in the 12 n = 4 states with € = 5 and 
j = 41/2 = 11/2, giving a magic number 50 + 20 + 12 = 82. Finally, 
the next shell has nucleons in the 42 — 12 = 30 states with n = 5 and j < 11/2, 
and in the 14 = 6 states with € = 6 and j = €4 1/2 = 13/2, giving a magic 
number 82 + 30 + 14 = 126. Thus the complete list of magic numbers is 


2, 8, 20, 28, 50, 82, 126. 


The only stable doubly magic nucleus heavier than calcium 40 is lead 208. 


6.4 Alpha Decay 


AS we saw in Section 3.3, in the first decade of the twentieth century Rutherford 
and his collaborators were able to distinguish two kinds of radioactivity. One 
was beta decay, the subject of Section 6.5. The other was alpha decay, the 
emission of a charged alpha particle, soon identified as a helium 4 nucleus. 
These alpha particles furnished Rutherford with a probe of atomic structure, 
with which he discovered the nucleus of the atom. 

Alpha decay has the remarkable feature that to get out of the nucleus the 
alpha particle must pass through a potential barrier that according to classical 
physics it cannot inhabit, because the potential energy there is greater than the 
total energy of the alpha particle. Only because of the wave nature of particles 
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in quantum mechanics is it possible for the alpha particle to leak through the 
barrier. The presence of this barrier gives the rate of alpha decay an extreme 
sensitivity to the energy of the emitted alpha particle and the radius of the nu- 
cleus. Similar Coulomb barriers govern the rate of spontaneous nuclear fission 
and of nuclear reactions in stars. 

We will assume spherical symmetry, and to avoid mathematical complica- 
tions consider only s-wave (J = 0) decays, which are the most common. The 
Schrédinger (5.2.19) for the radial wave function Rz(r) with alpha particle 
energy E and € = 0 takes the form 


h? 1d [ete 


"Dee! de cp + V()RE(r) = ERE(r), (6.4.1) 


where V(r) is taken to include both the Coulomb repulsion and the nuclear 
attraction between the alpha particle and the rest of the nucleus. We take E > 0, 
so that it is energetically possible for the alpha particle to exist far from the 
nucleus. It proves very convenient to write this instead as a differential equation 
for the reduced wave function ug (r) =r RE): 


nd? 
—-— —ru g(r) + V(r)ug(’) = Eug(r). (6.4.2) 
2my dr? 
As we saw in Section 5.2, the boundary condition for general orbital angu- 
lar momentum ¢ is that, for r — 0, Re(r) is proportional to r’ and hence 
uE(r) is proportional to rt! so for £ = 0 the condition is that ue (r) xr for 
r— 0. 

It is assumed that for r less than the nuclear radius R the potential V(r) is 
dominated by the nuclear attraction, which gives it negative values. For r greater 
than R the nuclear attraction is presumed to be ineffective, so V(r) becomes 
positive: 


Ze? 
V(r) = —— forr>R, (6.4.3) 
Yr 


where Ze is the electric charge of the final nucleus. We assume that for some 
range of r greater than R, this potential is greater than E. This is the region that 
classically cannot be inhabited by the alpha particle. (See Fig. 6.1.) 

To see how the wave function behaves in this region, it is convenient to rewrite 
Eq. (6.4.2) forr > Ras 


2 


d 
Fite) = KEEL) > (6.4.4) 
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Figure 6.1 An example of the potential V(r) felt by an alpha particle at a 
distance r from the nuclear center. 


where «¢(r) can be taken as the positive square root, 


2MNq 
KE(r) = + ry (V(r) —E). (6.4.5) 
We note that if V(r) and hence xg (r) were independent of 7, then Eq. (6.4.4) 
would have solutions proportional to exp(+«, r). It therefore may be guessed 
that if k_(r) varies sufficiently slowly with r then the wave function within the 
barrier takes the approximate form 


UE(r) = Cy(E)AE+(r) exp (+ ce(r)dr) 


+ C_(E)Ag-_(r) exp (- [ KE(r) ar) ; (6.4.6) 
R 


where the amplitudes Agi(r) vary more slowly than the exponentials, and 
the Ci(E) are r-independent factors determined by the conditions that the 
values and first derivatives of ugz(r) are continuous at the nuclear radius R. 
The appendix to this section shows that Ag.(r) = Ag_(r) = 1/VKE(Y), 
and describes the conditions on «g(r) under which Eq. (6.4.6) is a good 
approximation. 
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Now, if the barrier extended to infinity with V(r)>E then the only 
allowed values of energy would be those for which the growing exponential 
term in Eq. (6.4.6) was absent, which would require that E takes a value where 
C(E) = 0. These would be the energies of the true bound states of the alpha 
particle in the nucleus. In fact, V(r) falls to the value E at a radial coordinate 
r= br: 


be = 2Ze?/E (6.4.7) 


and V(r) < E forr > bg. The condition C1(£) = 0 picks out the energies 
of unstable states, for which the wave function becomes exponentially small 
outside the barrier, though not zero. 

For instance, if V(r) in the nucleus were a negative constant — Vo, then the 
general solution of Eq. (6.4.2) for r < R would be a linear combination of 
sin gr and cos qr, where 


1 
q= +7 V2ma(E + Vo). (6.4.8) 
The boundary condition that uz (r) « r forr — 0 tells us that (with an arbitrary 
normalization) the physical solution for r < R is 
ug(r) =singr. (6.4.9) 


In this case the continuity at R of the values and first derivatives of the wave 
functions (6.4.6) and (6.4.9) (with Ag+(r) = 1/./«kz(r) assumed to vary much 
more slowly than the exponentials) gives 


1 
TT ee nai VKE(R)[Ci —C_]=qceosgR, 


and therefore, for a constant potential in the nucleus, 


VKE(R) 
2 


Cu(E) ~ singR+ ( ) cosqR| (6.4.10) 


KE 


The condition C,(£) = 0 requires that tangR = —q/kg(R). For a very deep 
potential well, with «¢(R) much less than 2m, Vo/h and hence much less 
than q, the unstable state with lowest energy has q slightly greater than 2/2R. 

At a value of E where C,(£) = 0, the wave function outside the barrier is 
suppressed by a factor exp(—G(E)), where 


G(E) i Oia = [’ Peed 
=> K Pe r= ————_- —— —dr 
R = h R r be 
4m, Ze2be 4Ze2 
= —ga 4 Ribe) _ Fu, f RPE) ‘ (6.4.11) 
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where 


1 
f= f ~~ ldz= 5 — Vx(P =x) ~ aresin Vx, (6.4.12) 
x 
and vy = /2E/mz, is the velocity of the alpha particle when it escapes far 
from the nucleus. At the energy of an unstable state, where C_(E£) = 0, the 
probability density |R¢(bz)|* at the outer radius of the barrier is suppressed by 
a factor of order exp(—2G(E)). 

In the earliest successful theory of alpha decay,!° this factor was interpreted 
as the probability that an alpha particle coming out of the nucleus would pene- 
trate the Coulomb barrier. That is, the rate y of alpha decay was presumed to 
take the form 


Ty = vexp(—2G(E£)) , (6.4.13) 


where v is some sort of rate factor that reflects conditions within the nucleus. 
The factor v is commonly estimated as the rate v ~ V/R at which alpha 
particles inside the nucleus classically would strike the nuclear surface, where V 
is a typical alpha particle velocity inside the nucleus and R is the nuclear radius. 
As we have seen, for a very deep potential well the alpha particle wave number 
inside the nucleus is close to 1/2R, so V/R ~ h/2m,R?, which for a large 
nucleus with R ~ 9 x 107!%cm is 3 x 107° sec~!. The rate factor v is usually 
quoted as 107! sec™!. 

This is sometimes expressed in terms of the spacing of energy levels. For a 
flat deep nuclear potential with g >> kz(R), the energy levels of unstable states 
where C,(E) vanishes are at gR ~ (n+ 1/2)z with n = 0, 1, 2,..., so that 
their wave numbers are spaced by Ag = z/R. The spacing D in energy is then 
D ~ (dE/dq)Aq = iV/R, so V/R ~ D/xh.'® 

The appendix to this section gives a thoroughly quantum-mechanical deriva- 
tion of the decay rate that dispenses with the semi-classical picture of an alpha 
particle in the nucleus striking the nuclear surface and occasionally leaking 
through. The rate of decay of an unstable state with energy E, is found to be 
given by Eq. (6.4.54): 


C_(E}) 


Ree, 6.4.14 
AC’ (E1)| =o 


v= 


The factor multiplying e~?°) in Eq. (6.4.14) is of the same order of magni- 
tude as the rate factor V/R. For instance, for a flat nuclear potential, Eq. (6.4.10) 
suggests that the derivative of C+(£) with respect to wave number is of order 


I5 G. Gamow, Zeit. f. Physik 52, 510 (1929); E. U. Condon and R. W. Gurney, Phys. Rev. 33, 127 (1929). 

16 The rate factor v multiplying exp(—2G) is sometimes instead estimated as v ~ D/2sh; for instance, 
see J. M. Blatt and V. F. Weisskopf, Theoretical Nuclear Physics (John Wiley & Sons, New York, 1952), 
Section XI.2. 
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xy (R)R, while C_(E1) is of order «;/?(R), so C!, (E)/C_(E) is of order 
(dq/dE)R = R/hy and the factor C_(E\)/hC‘, (E)) in Eq. (6.4.14) is there- 
fore of order V/R. 

It must be admitted that taking the rate factor v as |C_(E1)/hC'!, (E1)| instead 
of V/R or D/fizr is not very important, because none of these estimates take into 
account the probability that an alpha particle will somehow become detached 
inside the nucleus from the rest of the nucleus. But at least Eq. (6.4.14) is a 
precise statement (for thick barriers with a slowly varying potential) of the rate 
at which an alpha particle that has become detached inside the nucleus will 
escape, and it does not depend on semi-classical hand-waving. 

This theory does correctly describe the extreme sensitivity of alpha particle 
decay rates to the energy and the nuclear radius, due almost entirely to the 
barrier penetration factor exp(—2G(£)). In particular, without needing to worry 
about the rate factor v we can use the above results for the barrier penetration 
exponent G(£) to understand the trend of the dependence of the logarithm of 
the mean lifetime t, = 1/I, on energy. Note that for a thick barrier with 
be > R, the leading and next-to-leading terms in the expansion of Eq. (6.4.11) 
in powers of R/bg give 


4Ze2 | x R Rvr 
G(E) = 9 +O) — (6.4.15) 
fivgy | 2 br 


Since vg « VE and be « 1/E, we have 
a 
a 
with @ and # constant in energy. This dependence of Int, on energy was 
originally noticed in 1911 as a dependence of the alpha particle range in air 

on energy, and in that form is known as the Geiger—Nuttall law.'" 

For a numerical example let us consider the historically important decay 
process 220Ra — 222Rn + 4He. The nuclei 226Ra, 222Rn, and “He all have 
spin zero and even parity, so the alpha particle in this decay has £ = 0, as we 
assumed in our calculation of G(E). The alpha particles from this decay have a 
velocity vg = 1.519 x 10° cm/sec, and radon has Z = 86, so here the first factor 
in Eq. (6.4.11) for G(E) is 4Ze? /hvg = 49.55. Also, be = 5.18 x 107? cm. 
According to Eq. (6.1.1) the radius of 7??Rn is approximately 7.9 x 107!? cm, 
to which we should add the radius ~ 2 x 107!3 of 4He, and so the effective 
nuclear radius here is R ~ 9.9 x 1073 cm, and R/bg ~ 0.19. The func- 
tion (6.4.12) is then f(R/be) = 0.72. Equation (6.4.11) then gives G(E) = 
35.7, and the barrier penetration probability is exp(—2G) ~ 10~7!. If we take 
v ~ 107! sec! then Eq. (6.4.13) gives a radium mean life 1/Ty of order 10!° 


Inty « G(E) = +B OE). (6.4.16) 


17 H. Geiger and J. M. Nuttall, Phil. Mag. 22, 613 (1911); 23, 439 (1912). 
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sec. It is the smallness of the factor exp(—2G) that is responsible for the radium 
226 nuclei produced in a chain of radioactive decays from uranium 238 living 
long enough to be discovered in uranium ores in 1898 by Marie and Pierre 
Curie. The predicted mean lifetime, of order 10!° sec, may be compared with 
the measured mean life of 2300 years = 7 x 10!° sec. The agreement, such 
as it is, is somewhat accidental, because the decay rate is so sensitive to the 
nuclear radius R. For instance, if we had taken the effective nuclear radius as 
R = 9.3 x 10~ cm instead of R = 9.9 x 107!3 cm, then with everything else 
the same we would have found a predicted mean life of 5600 years. Indeed, 
rather than using known values of R to calculate alpha decay rates of various 
nuclei, the observed decay rates were historically used to estimate R. For this 
purpose, it is not important to be precise about the value of the factor v multi- 
plying exp(—2G) in Eq. (6.4.13). But it is worth trying to be precise about this 
in order to make sure that we understand the decay process. 


Appendix: Quantum Theory of Barrier Penetration Rates 


This appendix presents a thoroughly quantum-mechanical solution of a some- 
what artificial problem. We consider a particle in a negative nuclear potential 
well surrounded by a positive potential barrier, whose wave function is initially 
confined to the nuclear potential well, and we calculate the rate at which the par- 
ticle escapes to infinity, without relying on the semi-classical picture of particles 
in the nucleus continually banging into the potential barrier and occasionally 
leaking through. The calculations in this appendix do not depend on the detailed 
form of the potential in the barrier, and so apply also the case where ¢ 0, 
where, as in Eq. (5.2.19), we include a centrifugal term hL(e +1)/ 2mr? in the 
potential. 

Our strategy will be to assume some initial wave function for the particle, 
entirely confined in the nuclear potential well, expand it in orthonormalized 
solutions we” (r) of the Schrédinger equation for various energies E, give each 
such solution a time dependence exp(—iEt/f), and see what happens as the 
time increases. !® In the course of this calculation we will be able to give an idea 
of the conditions under which the approximation (6.4.6) is valid, and find the 
amplitudes Af+(r). 

Our first task is to calculate the not-yet-normalized reduced wave function 
inside the barrier, where it satisfies Eq. (6.4.4). This differential equation has 
no general analytic solution. We again guess that if the potential varies slowly 
(in a sense to be determined) then Eq. (6.4.4) has approximate solutions 


18 This follows the approach of E. Fermi, Nuclear Physics, lecture notes compiled by J. Orear, A. H. 
Rosenfeld, and R. A. Schluter, revised ed. (University of Chicago Press, Chicago, 1950), Chapter III. 
The treatment in this appendix is somewhat simplified by working throughout with continuum wave 
functions, and supplies some justifications skipped over by Fermi. 
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uE(r) = Ag+(r) exp [2 [ee ar| ; (6.4.17) 


where the amplitudes Ag+(r) vary more slowly than the exponentials. Before 
making any approximations, Eq. (6.4.4) can be written as a differential equation 
for Ag+(r): 


2KpAlrs +kipApet Any =0. (6.4.18) 


We can implement the approximation that A ¢+(r) varies slowly by dropping the 
second derivative A 4(r), solving Eq. (6.4.18), using the solution to calculate 
A’,4.(r), and checking under what conditions it may indeed be neglected. With 
the term +A‘, (r) dropped, Eq. (6.4.18) becomes A‘, /Ag+ = —k),/2KE, 
which has the easy solution Agi(r) « 1//keE(r). Then 


" " / 
Age _ KE 3 Kp 
/ ~ / . 

KpAE+ 2K EKE 4K? 


Thus A‘,, is indeed negligible compared with the term «}, A+ in Eq. (6.4.18) if 
1 1 


KE 


// 
KE 


~| <1 and 
KE 


= * Is (6.4.19) 


which is to say that both «},(r) and xg (r) undergo only small fractional changes 
in a distance of order 1/«z(r). Under these conditions, Eq. (6.4.4) has the two 
independent approximate solutions 


ser (+ [ xewar) : 


This is known as the WKB approximation.'? We can write the general solution 
of Eq. (6.4.4) inside the barrier as a linear combination of these solutions: 


up(r) = ee (+ [o« ar) + = ex (- fo war) 
ee Tee) Ie Jee) \ da : 
(6.4.20) 


Beyond the barrier, where r > bg (with V(bg) = E), it is convenient to 
write the Schrédinger equation (6.4.2) for the reduced wave function as 


2 


d 
Fite) = Kru) ; (6.4.21) 


19 G. Wentzel, Zeit. f. Phys. 38, 518 (1926); H. A. Kramers, Zeit. f. Phys. 39, 828 (1926); L. Brillouin, 
Compt. Rendus Acad. Sci. 183, 24 (1926). 
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where 


1 
kg(r) = +5 V2ma(E = Viri) (6.4.22) 


Following the same arguments as before, provided that 
Alas 
ke |k’p 


we can use the WKB approximation to find solutions 


«<1 and — << (6.4.23) 


UE(T) X ke(r)dr + ») ; (6.4.24) 


1 
——— cos 
VkE(r) (/ 
where is any angle. We have two independent solutions, given by using 
Eq. (6.4.24) with ? taken as two different angles. 

We need to work out how each of the two independent solutions of the 
Schrédinger equation inside the Coulomb barrier, for r < bg, merges with 
linear combinations of the two independent solutions beyond the barrier, where 
r > bg. Unfortunately we cannot do this by equating the value and derivative 
of the WKB solutions for r just below and just above bg, because kz(r) and 
ke (r) both vanish at r = bz, and so the conditions (6.4.19) and (6.4.23) for the 
validity of the WKB approximation break down near bz. This is a well-known 
problem in the use of the WKB approximation to calculate bound state energies, 
but here we will encounter an additional difficulty. 

We will make the reasonable assumption that V(r) — E approaches a function 
proportional to bg — r for r near bz. In this case, forr > bz, 


ke > Bevr—be for r>b_z, (6.4.25) 


with Bz a positive function of E. It is convenient to define a new independent 
variable 


o= i ke(r')dr’ > FE — bp)”. (6.4.26) 
be 


The Schrédinger equation (6.4.21) then takes the form 
du 1 du 


ae ae eS 6.4.27 
do + 3¢ do +u ( ) 

with two independent solutions 
ux 9? Ja1/3(9) 5 (6.4.28) 


where J, is the usual Bessel function of order v. Likewise, for r < bg we have 


KE —> Bevbe—r for r— bz, (6.4.29) 
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with 6g the same positive function of E as in Eq. (6.4.25). Here it is convenient 
to define 


be 
B= [Kec dr’ > 


be =f, (6.4.30) 


The Schrédinger equation (6.4.4) then takes the form 
du 1 du 
— SO, (6.4.31) 
dp 3d 

with two independent solutions 


—1/3 = 
ux db 1136) « (6.4.32) 

where here /,,(¢) is the Bessel function of order v with imaginary argument: 
1) = 17? J, (et 7G) | (6.4.33) 


To see how the solutions for r > bg andr < bg merge with each other at 
r = bg, we note that, for ¢ > 0, 


2/3 91/3 


¢' 3 3(b) > ¢' 3 J_13(¢) > 


21737 (4/3) ’ T'(2/3) ’ 
while, for @ — 0, 
2/3 1/3 
= ae @ 
o°hp® > sara’ % B® > Tay: 


But 62/3 > (26p/3)2/3(r — bg) and @ > —> (26 ¢/3)23(be — 1), so 


$' Ju13(¢) —> $6" Lipp@) . (6.4.34) 


where “<=>” means “connects smoothly at r = br.” 

To learn from these results about the WKB solutions, we note that the con- 
ditions (6.4.19) and (6.4.23) are satisfied if @ >> 1 and @ > 1. As long as 
the approximations (6.4.25) and (6.4.29) are still valid for these large values 
of @ and @, we can take wave functions in the WKB approximations as the 
asymptotic limits of the solutions (6.4.28) and (6.4.32): 


2 el 4 
1/3 * p—1/6 a ee 
@' J41/3(o) > Vz? cos (o+ 6 ) 


2 (362 \"° _ : 
=r (2) Keer 00s ( / ke(r ar 2-2) (6.4.35) 


E 
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and 
— ee Cees 
@ lip) > a ° exp (6) 
5\ 1/6 i 
= = (22) xp? (r) exp ( / ce(r ar’) (6.4.36) 
but 


= a - 3 
8 Lhays@ — Ly] > —| 29 exp (-9) 


2\ 1/6 bg 
- a (22) Kyl? (r) exp (- | ce (rar) . (6.4.37) 


From Eqs. (6.4.37), (6.4.35), and (6.4.34), we see that 


be 
Ke Pr) exp (- | ce(r ar’) 
ee (. Ndr! u 
= ke (r) | cos x E(r)dr aaa 
+ cos ( ke(r’)dr! + = = *)| 


—1/2 i IU 
= 2k,“ (r) cos (| ke(r')dr’ — 7) : (6.4.38) 
br 4 

To find the other connection formula we need, we now have a problem. 
Inspecting Eqs. (6.4.36) and (6.4.35), how do we decide in using Eq. (6.4.34) 
whether oa : exp(@) connects smoothly with —26~!/° cos(@ — 2/6 — 1/4) 
or with +2¢'/° cos(@ + 2/6 — 1/4)? This puzzle arises because, lurking 

. —-1/6 ; . ; 

under the term proportional to ¢ exp (¢) in the asymptotic expansion of 
eo” oe /3 (), there are terms with unknown coefficients that are proportional 
tog ue exp (—¢) and are therefore negligible for ¢ >> 1 but that nevertheless, 
as shown in Eq. (6.4.38), connect smoothly with a term proportional to 
eV 6 cos(@ — 7/4). (This is known as the Stokes phenomenon.) As we saw in 
deriving Eq. (6.4.38), the difference between — p~!/° cos(@ — 1/6 — 7/4) and 
+ o'/° cos(@ + 1/6 — 1/4) is proportional to ¢~!/° cos(@ — 2/4), so we can 
take oo exp(@) to connect smoothly with — @~!/° cos(@ — 1/6 — 1/4) or 
with + $!/° cos(¢@ + /6 — 1/4) or with their average, plus a term proportional 
to d~!/6 cos(@ — a /4) with a coefficient that cannot be calculated within our 
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present approximations. Using the average, we have then from Eqs. (6.4.36), 
(6.4.35), and (6.4.34): 


b 
Ke P(r) exp (/ . ce(rar') —— —k ?) 


br 1 be 
x eo ( / ke(r')dr! + *) + &(E) cos ( / ke(r')dr! — *)| . 


(6.4.39) 
with €(£) an unknown coefficient. 
We can write the exponentials in Eq. (6.4.20) as 
r be 
exp (+ KE(r’) ar’) = e*O) exp (+/ KE(r’) ar’) : 
R r 
where G(£) is the barrier penetration exponent: 
be 
G(E)= / Kg(r)dr , (6.4.40) 
R 


given by Eq. (6.4.11) for a Coulomb potential with € = 0. Then the wave 
function uz (r) that takes the form (6.4.20) for R <r < bg takes the following 
form for r > be: 


See G(E) ~G(E) 
uR(r) = Je oc. Be + &(E)C_(E)e ) 


x COS (| ke(r’)dr'’ — *) 
be 4 


+ C_(E)e~°™ cos ( / : ke(r’)dr’! + *)| . (6.4.41) 


be 
Now we need to consider the normalization of the wave functions. Since the 
Hamiltonian is Hermitian, and allowed values of energy E form a continuum, 
we know that wave functions with different energy are orthogonal, in the sense 
that 


[O veuear = [- Re(r) Re(r) r? dr = N7(E)S(E — E’). 
0 0 
(6.4.42) 


The only question is, what is the coefficient N?(E)? Once we know this, we can 
define orthonormalized wave functions 


u(r) = N7"(E)ug(r), (6.4.43) 


20 Without explanation, Fermi in the reference in footnote 18 took € = 0. This is not justified, but as we 
shall see it makes no difference in the decay rate. 
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for which 
a N N 
i u(r) u(r) dr = 5(E — E’), (6.4.44) 
0 


and use these in an expansion of the time-dependent wave function. 

To find the coefficient of the delta function in Eq. (6.4.42), we can discard any 
term in the integral that remains finite as E + E’. The singularity as E’ > E 
in this integral comes entirely from the infinite range where r is much larger 
than be, so in the integral we use the asymptotic form of Eq. (6.4.41): 


ug(r) > -lecerer® + &(E)C_(E)e~&) cos (kr — 17/4) 
+ C_(E)e~°™? cos (kr + 17/4) ], (6.4.45) 
where now k is the wave number of the free particle, 
2MgE . (6.4.46) 
To calculate the singular part of the integral (6.4.42), we insert a convergence 


factor exp(—er) and consider the limit as « > 0+ and E’ > E. A straightfor- 
ward calculation gives 


[o.2) 
/ dr e *' cos (kr + -) cos (K'r + “) 
: 4 4 
_ 1 k+k’ , € 
2 Lek +h) A+R]? 


IT T 
ieee (« es =) ( ~) 
[ re“ cos (kr 7 ) cos (kr + 7 


1 € k —k’ 
Ler HREM? +k)? 


Using a well-known representation of the delta function, 


€ 


Zao = (1/2)5(k —k’), 


and discarding any terms that are not singular when we set k = k’ and then let 


€ go to zero, we have 


mh7k 


4my 


i cos (kr + 1/4) cos (k’r + 1/4) = (17/4)5(k — kK’) = 6(E — E’), 
0 


i? cos (kr + 1/4) cos (k'r + m/4) =0. (6.4.47) 
0 
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Equation (6.4.45) then gives Eq. (6.4.42), with 


2 
’ [(2C, (EB) eF®) + &(E)C_(E) eG)? + C2 (B) e26) 


(6.4.48) 


N°(E) = 


We have been ruthless here in discarding non-singular terms, but this is a pre- 
cise result, apart from the WKB approximation, which was used in deriving 
Eq. (6.4.41). 

We now turn to calculating the time-dependent reduced wave function u(r, fr), 
assuming that at f = 0 it takes the form 


u(r, 0) =| _ _ : (6.4.49) 


where £7} is the energy of an unstable state for which C1(£,) = 0. We can 
expand this in orthonormalized solutions of the Schrédinger equation 


_ = (N) - 1 (N) py / 
u(r,0) = ; dEu, ie 8 dru, (r )u(r’,0) 


-[- sea f dr'u Dr! jug, (r’). 


The time-dependent Schr6édinger equation tells us that to find the wave function 
at any later time we must insert a factor e~'£"/? in the integrand: 


oo R 
u(r, t) = dBc ty Me f dru (rug, (0) 
0 


(6.4.50) 


Amy [© ; 
= — dEe tlh yp(r) 
, Joi drug (ug, (r') 
[2C,(E) eG) + &(E)C_(E) e~G@)2 + C2 (E) 2G)" 
(6.4.51) 


For r’ < R the wave function ug(r’) is unaffected by the potential barrier, 
and therefore (as shown for example in Eq. (6.4.9)) varies smoothly with E. 
On the other hand, the term in the denominator proportional to e?°“) makes 
the integrand very small except very near the energies of unstable states, where 
C(E) vanishes. The integral is therefore dominated by values of E very near 
the energies E, at which C,(E) vanishes. These are the energies of nearly 
stable states, so the wave functions uz, (r) are approximate eigenfunctions of 
the Hamiltonian, and therefore are approximately orthogonal, so in Eq. (6.4.51) 
the integral le dr'ug, (r’)ug,(r’) is very small for n # 1. For E very near 
E,, we can approximate C1(E) > C A (E,)(E — E}). Since the contribution 
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of other energies is exponentially suppressed, we can set EK = EF, everywhere 
except in the factor E — Fy and the exponential, and extend the range of energy 
integration to run over the whole real axis, and for r < R write 


4m - we 
u(r,t) ~ < ue(r) f dra, of dE 
whe 0 : —0o 


e-iEt/h 


x : 
[2C!, (Ey) eG ED (E — Ey) + &(E\)C_(Ey)e~G EVP. + C2 (Ey) e 2G Ev) 
(6.4.52) 


For t > 0 the contour of integration over E can be closed with a large semicircle 
in the lower half of the complex plane, on which e~''/" is exponentially small. 
Since this contour is now closed clockwise, the integral is given by —2iz times 
the residue of the pole at EF = €; — ijC_.(E)/2C Eple -o =, where 

Ey = Ey — §(Enye@C_(E1)/C4(E1) - 


The wave function in the potential well therefore goes as 


(r,t) gE oe Tatl Pa (r) , (6.4.53) 
where 
= ee oW2G(E1) (6.4.54) 
hC! (E}) 


From the square of Eq. (6.4.53), we see that 'y is the rate of decay of the 
probability density |u(r,t)|? of the alpha particle within the nucleus. The 
Stokes phenomenon has led to an incalculable but exponentially small shift 
in the oscillation frequency of the wave function, but has no effect on the 
decay rate. Equation (6.4.54) justifies the appearance of the suppression factor 
e~?G(F1) in Eq. (6.4.14). 
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The earliest studies of radioactivity revealed the existence of a distinct class of 
radioactive processes, beta decay, in which an electron is emitted in a transition 
between nuclear states. For instance, 234TH, which is itself a product of the 
alpha decay of 78U, undergoes beta decay to **+Pa. As we have seen, at first 
beta decay was taken as evidence for the view that nuclei consist of protons 
and electrons, but this interpretation was abandoned with the realization in the 
1930s that nuclei are composed of protons and neutrons. An electron is created 
at the moment of beta decay when a neutron turns into a proton — the electron 
is no more in the nucleus before it is emitted than a photon is in an atom before 
it is radiated. 
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But there was a peculiar difference between the observed energies of the 
photons emitted in atomic transitions and the electrons emitted in beta decay. 
As we saw in Section 3.4, Bohr had realized in 1913 that a photon emitted 
in any given atomic transition has a unique energy, given by the difference in 
energies of the initial and final atomic states. Chadwick discovered in 1914 
that the energies of electrons emitted in a beta transition between any specific 
nuclear states do not have any one value, but occupy a range up to some def- 
inite maximum. This might be explained if a photon is emitted along with the 
electron, with the energy of the nuclear transition shared between the electron 
and the photon in a proportion that varies from one decay event to another. The 
electron energy would come close to a maximum value, equal to the energy 
released in the nuclear transition, only when the photon happens to have very 
low energy. If beta decay produced a photon along with the electron, then 
when these decay products are caught in a surrounding medium the heat energy 
given to the medium would be the same in each decay event, equal to the 
energy difference of the initial and final nuclear states, and hence equal to the 
maximum value observed for the electron energy in this decay. But experiments 
in 1927 by C. D. Ellis (1895-1980) and W. A. Wooster (1903-1984) showed 
that the average energy deposited in the medium surrounding the decaying 
nucleus was not equal to the maximum energy of the electron, but instead to its 
average, as if whatever energy was not carried by the electron was simply lost. 
Bohr was even led by this to speculate that energy might not be conserved in 
beta decays. 

A different explanation was offered in 1930 by Pauli. He proposed that the 
electron in beta decay is indeed accompanied by another particle that because 
electrically neutral had escaped detection, but this neutral particle is not a pho- 
ton. Rather, it is an extremely penetrating particle that is not captured in the 
surrounding medium. The particle soon became known as a neutrino, symbol- 
ized v. The underlying reaction isn — p+e~ +v (where n and p stand for the 
neutron and proton, and the electron is denoted e~, for a reason we will come 
to presently). Among many other examples, this is responsible for the decay of 
the ground state of boron 12 to carbon in the reaction '7B > C+ e— +0, 
as well as for the decay of the free neutron, — p+e~ + v. Since neutrons, 
protons, and electrons have spin 1/2, angular momentum conservation requires 
the neutrino to have a half-integer spin. It is in fact known to have spin 1/2. 

There are also radioactive decays in which instead of an electron there is emit- 
ted a positron, et, the electron’s antiparticle, with the same mass but opposite 
electric charge.”! The conservation of energy forbids the process p > n+e++v 


21 The existence of the positron was anticipated in 1930 by P. A. M. Dirac, Proc. Roy. Soc. A126, 360 (1930). 
He had developed a relativistic version of the Schrédinger equation, which turned out to have solutions 
corresponding to states of negatively charged electrons with negative energy as well as states with positive 
energy. Dirac’s interpretation was that these negative-energy states are normally filled, one electron to 
each negative-energy state in accordance with the Pauli exclusion principle, but that occasionally there 
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for free protons, but in nuclei this process can produce decays such as the beta 
decay of the ground state of nitrogen 12, 17N > C+ et 4 v. 

In 1934 Fermi proposed a detailed theory of beta decay.” In Fermi’s theory 
the interaction Hamiltonian takes the form 


Hp = (he) Grnuy | @xVYV’ +ec., (6.5.1) 


where Gf is a constant; V“ and V” are operators with the same Lorentz and 
space inversion transformation properties as the electric current J“ and with 
the dimensionality of densities (that is, inverse volumes); V“ acts to change 
neutrons to protons; V" acts to create electrons and neutrinos; and as usual 
c.c. indicates the adjoint of the foregoing term. The factor (fic)? is extracted 
from GF for later convenience. As we will see in the appendix to Section 7.4 
these currents are bilinear functions of Dirac fields, but we will not need that 
information for our limited purposes here. 

Fermi’s theory almost immediately needed modification. The three-vector 
part of the current V“ is odd under space inversion, so when acting on nuclear 
states it gives a contribution proportional to nucleon velocities v, and so is 
suppressed by a factor of order |v|/c, which is small in nuclei, as in atoms. 
This leaves the time component V°, which is even under space inversion and 
is a rotational scalar. For decays that are not suppressed by a centrifugal barrier 
there is no orbital angular momentum, so in these decays neither the parity nor 
spin of the nuclear states can change. But many beta decays were observed 
in which the spin of the nuclear state did change by one unit, and which yet 


appears a vacancy which we observe as a particle of positive charge and positive energy. At first Dirac 
identified these holes as protons, but then in 1932 a positively charged particle was unexpectedly found in 
cosmic rays by C. D. Anderson, Phys. Rev. 43, 491 (1932). (This article is included in Beyer, Foundations 
of Nuclear Physics, listed in the bibliography.) The cloud chamber tracks of these particles were observed 
to have the same curvature in a magnetic field as electron tracks, but in the opposite direction, consistent 
with a particle having the same mass as an electron and a charge of equal magnitude but opposite sign. It 
was widely supposed that these were Dirac’s holes. 

The interpretation of positrons as vacancies in a sea of negative-energy electrons has largely been 
abandoned. Dirac’s relativistic wave equation works only for particles of spin 1/2. This at first seemed 
like a triumph because protons and electrons were known to have spin 1/2, but by now we know of 
several particles of spin 0 and spin 1 (the H°, W+, W—, and Z°) that seem every bit as elementary as 
the electron. Furthermore, the Wt and W~ are each other’s antiparticles, in the same sense as the et 
and e~. But these are bosons, which do not obey the exclusion principle and so could not form a stable 
sea of negative-energy particles. As described in the appendix to Section 7.4, Dirac’s equation survives 
as the field equation satisfied by the quantum field of particles of spin 1/2 but not, as Dirac thought, as a 
relativistic version of a Schrédinger equation for a probability amplitude. 

As explained in Section 7.4, we now understand as a consequence of Lorentz invariance and quantum 
mechanics that for every species of particle, elementary or not, fermion or boson, there is a corresponding 
species of antiparticle, with the same mass and spin but opposite electric charge. The only qualification is 
that a few types of electrically neutral particles like the photon and the Z° are their own antiparticles. 

E. Fermi, Zeit. Phys. 88, 161 (1934). This article is reprinted in Beyer, Foundations of Nuclear Physics, 
listed in the bibliography. In his article Fermi cited an unpublished suggestion by Pauli that a neutral 
weakly interacting particle was emitted along with electrons in beta decay. 
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seemed to be “allowed” in the sense of having rates comparable to typical other 
beta decays with similar energy. For instance, the ground states of the nuclei 
8B and !?N mentioned in Section 6.2 have spin one and even parity, and yet 
have allowed beta decays into the ground state of !7C, which has spin zero and 
even parity. 

In order to allow such decays, Fermi’s theory was modified by adding an 


additional term to the interaction Hamiltonian:7° 


Hg = (hc)? Grnyy / @x[V4V’ + AXA’ ] +. c.c., (6.5.2) 


where A“ like V“ turns neutrons into protons; A” like V” creates electrons 
and neutrinos; and A“ and A” are axial vectors — that is, like V“ and V” they 
transform as four-vectors under proper Lorentz transformations, but they have 
opposite properties under space inversion: A is even and A° is odd, and likewise 
for A”. In consequence A acting on nuclear states can make contributions pro- 
portional to nucleon spin vectors, allowing beta decays in which spin changes 
by one unit, such as the beta decays of !*B and !*N, without suppression of 
the rate by factors of order |v|/c or centrifugal barriers. With the interaction 
Hamiltonian (6.5.2), the selection rules for “allowed” beta decays are that the 
nuclear parity does not change and that the nuclear spin can change by at most 
one unit. 

To estimate these “allowed” rates, we note that in order for Hg to have the 
dimensionality of energy, the constant (fic)?G must have the dimensions of 
energy times volume. Hence Gf has the convenient (and conventional) dimen- 
sionality of energy~*, which is why the factor (fc)* was inserted in Eqs. (6.5.1) 
and (6.5.2). The rate 'g of any beta decay process is proportional to G2, and if 
the energy E released in the decay is much larger than mec” = 0.511 MeV then 
apart from the factor Gi. it can only depend on E, so in order for it to have the 
dimensionality of a rate it must take the form 


r wigs (6.5.3) 
B™ i F Fs re 


This E> dependence is observed for high-energy beta decays that satisfy the 
selection rules for “allowed” beta decays. The energy G,” * turns out to be 
very large. For instance, as we saw in Section 6.2, the energy released in 
the beta decay of !?B to the ground state of !?C is 0.0144 M pC? = 13.4 MeV, 
much larger then m,c*. The rate is 48.5 sec~!. Using these numbers in 
Eq. (6.5.3) gives Ge ~ 2 x 10° GeV. (This energy is large in part because 
weak interactions are transmitted by a heavy particle, the W* particle with 


23 G. Gamow and E. Teller, Phys. Rev. 49, 895 (1936). There were other possibilities involving scalar and 
tensor operators that were not finally excluded by experimental data until the 1950s. 
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mass 80.4 GeV/c’, and in part because the interactions that emit and absorb 
W* particles are characterized by a small constant, of the same order as 
the fine-structure constant e?/fc ~ 1/137 of electrodynamics, which gives 


G;! a my C2 x /137 ~ 10°GeV. A more accurate value along with a 
more precise definition for Gf will be given in the appendix to Section 7.4.) 

The extremely low rate at which neutrinos were absorbed by the medium 
surrounding the radioactive nuclei in the 1927 experiments of Ellis and Wooster 
is due to the extreme weakness of interactions such as beta decay that involve 
neutrinos. In general the rates of neutrino interaction processes are characterized 
by the presence in the rate of the factor G?., which is what makes them so weak. 
For instance, the cross section for the neutrino reactions v + p > e* +n and 
v-+n — e + p (whether for a free proton or a proton or neutron inside a 
nucleus) is proportional to G2., and so, since it has the units of area, dimensional 
analysis requires that at a neutrino energy E considerably above m,c? the cross 
section takes the form 


o © (hcE)’G% . 


Recalling that Ac = 197 MeV x 107}? cm, we see that for a relatively high- 
energy beta decay neutrino with energy E = 10 MeV the cross section o is of 
order 10~*4 cm. In ordinary matter, with a number density n of nucleons 
of order 10°4 cm~3, this gives a mean free path 1/no ~ 107° cm, or about 
100 light years. It is no wonder that Ellis and Wooster did not detect energy 
deposited by neutrinos in their experiment. There never was any hope of 
detecting neutrinos from ordinary laboratory samples of radioactive material, 
but nuclear reactors emit such enormous floods of neutrinos from the beta 
decay of fission products that at last in 1956 Clyde Cohan, Jr. (1919-1974) 
and Frederick Reines (1918-1998) were able to detect neutrinos produced at 
the Savannah River reactor by detecting gamma rays from the annihilation of 
positrons produced in the reaction v + p> n+e?. 

All rates for processes involving neutrinos are suppressed by the factor G7,, 
and there are also reactions due to other weak interactions that do not involve 
neutrinos but are similarly suppressed. Among these are the decays of a particle 
called the K meson, with a mass of 495 MeV/c’, into two-pion and three-pion 
states, decays that are very slow compared with processes such as the decay 
of the three—three resonance into a pion and a nucleon that occur through the 
action of strong interactions. 

There is another common feature of weak interaction processes, beyond their 
weakness. They violate some of the symmetry principles obeyed by strong and 
electromagnetic interactions. It appeared that the charged K meson decayed 
both into two-pion states that are invariant under the space inversion transfor- 
mation x — —x, and also into three-pion states that change sign under space 
inversion, which would not be possible if the space inversion operator com- 
mutes with the Hamiltonian. It was this that led Tsung-Dao Lee (1926— ) and 
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Chen-Ning Yang (1922- ) in 195674 to suggest that weak interactions in general 
do not respect invariance under space inversion, a suggestion that was soon 
verified in the beta decay*> of ®°Co and in the decays of charged pions.”° 
Weak interactions also violate a symmetry between particles and antiparticles, 
and they violate the conservation of several quantities (collectively known as 
flavors), that are conserved by the strong and electromagnetic interactions. 

It used to be thought that neutrinos have zero mass. For massless particles 
the helicity, the component of angular momentum in the direction of motion, 
is Lorentz invariant. In 1957 Lee and Yang proposed?’ that the neutrinos emit- 
ted with electrons or positrons always have helicities A/2 and —fA/2, respec- 
tively. This was only possible if weak interactions violate invariance under space 
inversion, because space inversion transformations reverse the direction of the 
neutrino’s motion while leaving its spin unchanged, and so reverse the helicity. 
This proposal was incorporated in another change in the beta decay interaction 
that in our present schematic notation takes the form 

(ic) Gr 
Hg = ——=— 
/2 
in which the terms V“A, and AY, evidently violate space inversion 
symmetry. 

But Lee’s and Yang’s proposal regarding neutrino helicity could not be uni- 
versally and literally true unless neutrinos were massless, because the observed 
direction of motion of a massive particle is reversed if the observer travels with 
higher speed in the same direction, which does not affect its spin. It is now 
known that neutrinos have very small but non-zero mass, much less than the 
electron mass, and so like any massive particle of spin 1/2 neutrinos exist both 
in states with angular momentum components fi/2 and —/fi/2 in any direction. 
Experiment shows that the emission of an electron or positron in the beta decay 
of a nucleus at rest is accompanied by the emission of a neutrino that is over- 
whelmingly likely to be in a state with angular momentum component in the 
direction of motion, respectively +fi/2 or —fi/2, as proposed by Lee and Yang, 
but this would not be the case if the neutrino were viewed by a more rapidly 
moving observer. 

There is a complication regarding neutrinos that I have so far not mentioned. 
There are two other charged leptons, particles like the electron that have only 
electromagnetic and weak but not strong interactions. These are the muon, with 
mass 105.658 MeV/c”, mentioned briefly in Section 4.3, and the more recently 


iii / ax[V" + AM][V° +A] +c.c., (6.5.4) 


24 T. D. Lee and C. N. Yang, Phys. Rev. 104, 254 (1956). 
23 ©. S. Wueral., Phys. Rev. 104, 254 (1957). 
26 R. Garwin, L. Lederman, and M. Weinrich, Phys. Rev. 105, 1415 (1957); J. Friedman and V. Telegdi, 
Phys. Rev. 105, 1681 (1957). 
7 T. D. Lee and C. N. Yang, Phys. Rev. 105, 1671 (1957). 
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discovered tauon, with mass 1776.82 MeV/ C2. Both, like electrons, are emitted 
by strongly interacting particles along with neutrinos, but these neutrinos are 
not the same as the neutrinos emitted along with electrons or positrons in beta 
decay. Rather, the neutrinos emitted in beta decay and along with the production 
of muons and tauons are of three different types. For instance, the neutrino 
emitted along with a muon in the decay of a charged pion can create another 
muon in a reaction v +n + p+, but it cannot create an electron, and 
the neutrino emitted along with an electron in beta decay can create another 
electron in areaction v-+n — p+e_, but even if its energy were high enough 
it could not create a muon. 

Except that, in a sense, it can. For years there was a mysterious deficiency in 
the number of neutrinos observed to be coming from the Sun.78 These would 
be electron-type neutrinos, created in reactions such as p+ p > d+et+v. 
Bruno Pontecorvo (1919-1993) suggested”? that this is because neutrinos have 
mass but the states with definite mass are not electron-type or muon-type or 
tauon-type neutrinos. Rather, each of these is a superposition of neutrino states 
of definite mass. According to this idea, the electron-type neutrinos emitted by 
the Sun are superpositions of states of definite mass, which oscillate at different 
rates on their way to the Earth, arriving as incoherent mixtures of neutrinos 
of all three types. In the search for solar neutrinos the detectors were looking 
for the reaction v + 377Cl + e~ + 37Ar, and were therefore sensitive only to 
electron-type neutrinos, which according to Pontecorvo is why fewer neutrinos 
were detected than would have been the case if neutrinos were massless, the 
undetected neutrinos arriving as muon-type or tauon-type. This hypothesis was 
confirmed when it became possible to detect solar neutrinos in the reaction v + 
d — v+ p-+n, which is equally sensitive to neutrinos of all three types, and 
the number seen was just what was expected. The existence of neutrino masses 
has by now been convincingly confirmed in numerous terrestrial experiments, 
which, although they have not yielded values for individual neutrino masses, 
indicate that they are in the range of 0.01 to 0.1 eV/c?. 

When neutrinos were thought to have zero mass it was common to call the 
particle emitted along with an electron an antineutrino, reserving the term neu- 
trino for the particle emitted along with a positron. This was to preserve a widely 
accepted conservation law, of a quantity known as lepton number, analogous to 
baryon number. Electrons and neutrinos were supposed to have lepton number 
+1; positrons and antineutrinos would have lepton number —1, while protons 
and neutrons would have lepton number zero, so that lepton number would be 
conserved in both kinds of beta decay. But it is not possible to attribute different 
values for lepton number or any other conserved quantity to the neutral particles 


28 J. N. Bahcall, Phys. Rev. Lett. 12, 300 (1964); Phys. Rev. 135, B137 (1964); R. Davis, Jr., Phys. Rev. Lett. 
12, 303 (1964); R. Davis, Jr., D. S. Harmer, and K. C. Hoffmann, Phys. Rev. Lett. 26, 1205 (1968). 
9 B. Pontecorvo, JETP 53, 1717 (1967). 
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emitted with electrons or positrons in beta decay if they are just different spin 
states of the same particle. 

Now that we know that neutrinos have mass, there are two widely considered 
points of view regarding the nature of neutrinos and of lepton number and the 
origin of neutrino masses. 

First, in order to preserve the exact conservation of lepton number it would be 
necessary to suppose that the neutrino fields of electron-type, muon-type, and 
tauon-type with lepton number —1 each have distinct adjoints with lepton num- 
ber +1. In this view it is the states of helicity +f/2 of the field with lepton 
number —1 that have been observed to be emitted with electrons in beta decay, 
and it is the states of helicity —f/2 of the adjoint field with lepton number +1 
that have been observed to be emitted with positrons, while the other helicity 
states of the two fields of each type exist but are so far unobserved. Neutrinos 
of this description are often called Dirac neutrinos, because their fields are 
described in the same way as in the description of electrons by Dirac, discussed 
in the appendix to Section 6.4. 

The other possibility, often associated with the name of Ettore Majorana 
(1906-1938), is that lepton number is not conserved, and the three types of 
neutral particles emitted with negative and positive leptons are states of the same 
three spin 1/2 particles, which as a consequence of Eq. (6.5.4) are overwhelm- 
ingly likely to be emitted with helicity #/2 when emitted with e~, ~~, tT and 
with helicity —A/2 when emitted with e+, w+, t+. We can then regard these 
three neutral particles as their own antiparticles, like the photon or the 7°. 

For what it is worth, the Majorana alternative seems to me a more econom- 
ical and plausible view, which is why in this section I have not distinguished 
neutrinos and antineutrinos. In the Dirac case neutrinos get masses in much the 
same way as the other leptons and the quarks of the Standard Model, so it is 
mysterious why they are so light compared with other elementary particles. On 
the other hand, the masses of Majorana neutrinos can only arise from effects at 
very high energy, and are naturally in the observed range. 

Fermi’s theory correctly described the probability distribution for the 
energy of the electron or positron emitted in beta decay, a distribution that 
was unaffected by the subsequent modifications in the interaction Hamiltonian 
described above. With these modifications Fermi’s theory has survived as a cor- 
rect approximate theory for nuclear beta decay. It was in fact the first successful 
application of quantum field theory outside the context of electrodynamics. 


7 
Quantum Field Theory 


Chapter 5 described quantum mechanics in the context of particles moving in 
a potential. This application of quantum mechanics led to great advances in 
the 1920s and 1930s in our understanding of atoms, molecules, and much else. 
But starting around 1930, and increasingly since then, theoretical physicists 
have become aware of a deeper description of matter, in terms of fields. 
Just as Einstein and others had much earlier recognized that the energy and 
momentum of the electromagnetic field is packaged in bundles, the particles 
later called photons, so also there is an electron field whose energy and 
momentum are packaged in particles, observed as electrons, and likewise for 
every other sort of elementary particle. Indeed, in practice this is what we now 
mean by an elementary particle: it is the quantum of some field, which appears 
as an ingredient in whatever seem to be the fundamental equations of physics 
at any stage in our progress. 

This is a good place to warn of an old misunderstanding. It used to be thought 
by some theorists (perhaps de Broglie) that the wave function of a particle 
is a field, something like the electromagnetic field. Just as the creation and 
annihilation of photons was seen as a consequence of the application of quantum 
mechanics to the electromagnetic field, some theorists came to think that the 
creation and annihilation of electrons and other particles could be understood 
through the application of quantum mechanics to the wave function itself, a pro- 
cess known as second quantization. This does not work. The electromagnetic 
field cannot be interpreted like a wave function as a probability amplitude, 
and the Schrddinger wave function does not have the Lorentz transformation 
property of a scalar field. The wave function is not a field — it is a representation 
of a physical state. As discussed in Section 5.10, it is the component of the 
state vector in some basis, such as one labeled by the possible positions of a 
particle. Even though it is not generally useful to do so, we can also introduce 
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wave functions for fields — they are functionals of the field, quantities that 
depend on the value taken by the field at every point in space, equal to the 
component of the state vector in a basis labeled by these field values. One still 
sometimes hears talk of second quantization, but this idea is an obsolete 
historical relic. 


7.1 Canonical Formalism for Fields 


We begin by restating the canonical formalism described in Section 5.7, now in 
the context of fields. Here the gy (t) are fields ~, (x, t), with sums over the label 
N now comprising both sums over the discrete label n which distinguishes one 
type of field from another and integrals over the spatial argument x. In order 
to have any chance of a Lorentz-invariant theory, the action here must take the 
form of an integral over spacetime of a function of space derivatives as well as 
time derivatives of the fields 


+00 
fie= / ie / dt Lal, Vous 0) - (7.1.1) 


The function £ is known as the Lagrangian density. Comparing this with 
Eq. (5.7.11), we see that the Lagrangian here is 


Lott), o()1 = / dx Lon (1), ¥0n%t), On(Kt)) - (7.1.2) 


This Lagrangian is a functional rather than a function of @, (x, tf) and @, (x, t); 
that is, it depends on the values of @, (x, t) and @, (x, t) for all n and x at a given 
time ¢. Therefore, where derivatives of L appear in the canonical formalism as 
described in Section 5.7, they should now be interpreted as functional deriva- 
tives. In general, the functional derivatives 6F'/5g and 6 F'/d@ of any functional 
F of @, (x, t) and @, (x, f) at a fixed time r are defined by the prescription that the 
effect of independent infinitesimal variations in the arguments of the functional 
is given by 


Flg(t) + 69(t), P@) + 6—()] 


oF 5g oF +¢ . 
= Flo.o+ > f ax eh gnla.0 i" EO since) | , 


(7.1.3) 
For the particular functional (7.1.2), we have 


Lip(t) + dg(t), pt) + 69(@)] 


= fa°s L( n(x, t) + 5d Qn (X, Es VOn (x, t) + Vd Gn (X, t), Qn(X, t) + 5@n (x, t)) 
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= Litt), g(t)] 


4 > [ @xe See see ee Uae 
IL (Pn (Xt), V Pn (Xt), Pn(X 1) 

(In (X, t)/9x;) 
IL (Pn (Xt), Vn (X, 1), Pn (Xt) 


r) Dn yt 
s 3Gn(&, 1) ae i 


6 (OGn(X, t)/9.x;) 


= LIg(t), g()] 


3. | 9LGn(&, 1), VOn(X t), On(X 1) 
+ of a ee sons. 
a [eee 
OX; 9 (On (X, t)/9x)) 
IL (n(x, t), Vn (X, 1), Pn (X, t)) 
9Gn(X, t) 

where as usual a repeated index i is summed over the values 1, 2, 3. Comparing 

this with the definition (7.1.3), we have 


dGn(X, t) 


5 Qn (X, ne 


6L aL 0 aL 
= - (7.1.4) 
5Qn (x, t) On (x, t) OX; 0(0@n (x, t) /0x;) 
6L aL 
(7.1.5) 


bn (x, t) - OQn (x, t) ' 
in which to save writing we have dropped the arguments of L and CL. 


Field Equations 


We take the derivatives of the Lagrangian in the equations of motion (5.7.12) to 
be functional derivatives: 


C) éL éL 
5 = : (7.1.6) 
Ot | d@n(X, t) dn (x, t) 
Equations (7.1.4) and (7.1.5) then give the field equations 
a aL o£ 0 aL 
: = = : (7.1.7) 
dt LOGn (x, t) DGn(X,t) AX; | O(AGn(x, t)/dx;) 


These are known as the Euler-Lagrange equations. We can put Eq. (7.1.7) into 
a form that appears more consistent with Lorentz invariance: 


aL C) o£ 
Sg pl (7.1.8) 
OQn (X, ft) axl O(O@p (x, t)/dx") 
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in which as usual the repeated index jz is summed over the values « = 1,2, 3,0, 
again with x9 = ct. 


Commutation Relations 


The field equations (7.1.8) could have been derived more easily by directly 
requiring that the action (7.1.1) must be stationary with respect to arbitrary 
infinitesimal variations of the gy, (x,t) that vanish when |x| — oo or when 
|t] — oo. The calculation of functional derivatives is however important in 
finding the commutation relations of the fields. The canonical conjugate zy, (x, f) 
to W(x, t) is defined as in Eq. (5.7.13) but with a functional derivative of L in 
place of an ordinary derivative: 
6L aL 


an ee Ie. oe 


The canonical commutation relations (5.7.5) here read 


[Gn (Xt), Hm (Y,t)] = iM8nmd3(x — y) , 
[Gn (Xx, t), Qm(Y, t)] = [7n(X, t), mY, t)] =0. (7.1.10) 


We will explore the consequences of these relations in the next section. 


Energy and Momentum 


In order to calculate the energies of the various states in a quantum field theory, 
we need to know the Hamiltonian. Returning to Eq. (5.7.14) and again replacing 
derivatives with functional derivatives and sums with sums and integrals, we 
have 


H= pe 7 dx Se 5 ba nf=z = dx Tmeneen—e]. 
" (TALI 


evaluated at any time ¢. As explained in Section 5.7, the momentum operator of 
any system is the generator of space translations. Under an infinitesimal space 
translation x — x + e, the fields are changed by 


Qn(X,t) > Pn(K + €,t) = On (Xt) + € - Von (X, fF) 5 


so, according to the —_ rule Eq. (5.7.16), the momentum operator is 


P= Df ess. ae Feces t)= fax Tn (Xt) VOn(X,t). (7.1.12) 
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We next consider the simplest example of a quantum field theory, with a single 
real scalar field g(x), “free” in the sense that the field equations are linear. 
Of course, we are really interested in what happens when fields interact, but, 
as we will see in the next section, the first step in dealing with interacting fields 
is to understand the content of the free-field theory. 

We will take the Lagrangian density to have the form 


1 dQ(x) Ip(x) mc? 2 
cr ay = , 7.2.1 
ole) as Oxe Ox? Qn e ( 


the justification being that, as we shall see, this gives a sensible theory of 
free spinless particles of mass m. (We are using the conventions described in 
Chapter 4, with x9 = ct; repeated indices are summed, with n“” = +1 for 
w=v=1,2,3, n4” = —1 for uw = v = 0, and n”” = 0 otherwise. This makes 
the action [ d*x £o Lorentz invariant.) The subscript 0 on L is to remind us that 
this is just the part of the Lagrangian density that would describe free fields if 
nothing else were added. We will have to add additional terms in the following 
section to include interactions. 
The field equations (7.1.8) here are 


) dp(x) 
2.2 /%2 v 
= hyo = bu ; 
ee aaa ey E a0 | 
or more simply 
(O—m’c?/h’)p = 0, (7.2.2) 


where LJ is the d’ Alembertian operator: 


i Oe ee 


Ox! Ax’ 2 are” 


= (7.2.3) 


The general real solutions of Eq. (7.2.2) are of the form 
i 
g(x, t) = / d°*p [Aw exp (-© = Ep) 


+ A‘ (p) exp (F0 ee Ep) (7.2.4) 
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where E(p) = \/c2p?+m2c4, and the coefficients A(p) and A‘(p) are 
spacetime-independent operators whose properties are to be determined from 
the canonical commutation relations. 

The canonical conjugate to @ is here 


dLo 1, 
= ~9(x,t). 7.2.5 
Sey = em (7.2.5) 


u(x,t) = 
Then 


[o(x,t), 2(y,1)] = / dp / d3p' (-iE(p)/2h) 
x {a exp (<0 x Ep) + A¥(p) exp Gc “x= ew) , 


{aw exp (50 y= Ep) —At(p) exo (Fee ye Ep) || 


Terms in the integrand that are proportional to the product exp(—iE(p)t/h) x 
exp (—i E(p’)t/h) or to the product exp(i E (p)t/h) exp(i E(p’)t/h) would make 
different time-dependent contributions to the integral for any values of p and p’, 
so since the canonical commutation rules give a time-independent commutator, 
we must have 


[A(p), A(p’)] = [A*(p), At (p’)] = 0. (7.2.6) 


The commutator is then 


[o,1),x(y,0)1 = / dp / dp! (-iE(p')/c2h) 
x [ta At (p’)]exp (;0 > Ep) exp (Fe y- Ep) 
—[A(p), At (p)] exp (50 y= Ep) exp (Fe x- Ep) | 


The commutator must be proportional to 5°(x — y), and in particular must be a 
function only of x — y, so the commutator of A(p) with A‘ (p’) must be propor- 
tional to 6°(p — p’), 


[A(p), A'(p’)] = f(p)53(p — pp’). 


which also ensures the cancellation of time-dependent factors. The commutator 
is then 


[9(%, 1), x(y,1)] = i dp GE(p)/ch)Lf(p) + f(—p)leh 9" 
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The canonical commutation relations require that 


= a 3 Lip (x—y)/h 
[p(x, t), w(y,t)] = ihd" (x — y) Onhys fa pe 
and therefore 
ch 
FO + ICP) = Sse 


At this point, we have to look at the commutator of the field with itself. Using 
what we have already learned about [A(p), A(p’)] and [A(p), A'(p’)], we see 
that 


[p(x t), ey, 0)] = [er [f(p) — f(—p)eiP OY? 


Since this commutator has to vanish, we must have f(p) = f(—p), so we must 
take 


ch? 


FP) = SOP) = GpsaE@ * 
It is therefore convenient to define a(p) = A(p)// f (p), so that 
[a(p),a' (p’)] = 5°(p — pp’) (7.2.7) 


and 


dp 
g(x,t) = ne [ 
/2E(p) (27 h)3/2 


x jaw) exp (-0 ie Ep) +a’ (p) exp (Fe _- Ew) 
(7.2.8) 


The operators a(p) and a‘(p) are analogous to the operators a; and a; intro- 
duced in our discussion of the harmonic oscillator Hamiltonian in Section 6.3 
but with a continuum momentum argument in place of the three-valued index i 
and a delta function instead of a Kronecker delta. 

The Hamiltonian for the free scalar field is given by 


] 1 m2 
Ho = [ @xt06 - co = 5 [as =~ + (Vo) +9" } . 
2 Cc hi 


Since this is quadratic in the field g, when we insert the expression (7.2.8) for 
gy in Ho, we encounter a double integral over momentum. The integral over x 
yields (27h)? factors times momentum delta functions that reduce this to a sin- 
gle momentum integral. The time-dependent terms in the integrand proportional 
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to exp(—2i E(p)t/f) or exp(2i E(p)t/h) are also proportional to —E?(p)/c? + 
p’ + m?c?, which vanishes. This leaves the time-independent terms 

i 
“2 


In the same way, we find the momentum operator 


Ho = 5 [Pp £@) [a wa) tap]. 7.2.9) 
Vf 7B i ; 

P=5 | appa’ @atp) + a(p)a'p)| . (7.2.10) 

We can check that Eq. (7.2.9) is consistent with what we know to be the 

time dependence of the field. The canonical commutation relations have been 


constructed so that 


ihg(x) = [(x), Ho] = ne [ V2E (py 2rh)3/ 


x law), Ho] exp (;@ oP So Ep) 


+ [a"(p), Hol exp (F0 ‘x= Ep) ‘ 


From Eq. (7.2.9) we have 


[a(p), Hol = E(p)a(p), —[a‘(p), Ho] = —E(p)a"(p) . (7.2.11) 
NYO) 
- dp 
oeo=¢f Tepe”) 


x law) exp (<0 x Ep) — ai (p) exp xc x e))| 


which is the same as would be given directly by taking the time derivative of 
Eq. (7.2.8). 
Likewise, from Eq. (7.2.10), we have 


[a(p),P] =pa(p), _—_[a'(p), P] = —pa‘(p). (7.2.12) 


from which we can see that, as expected, [g(x), P] = —ihVg(x). 

Equations (7.2.11) and (7.2.12) show that a(p) and a’ (p) act as annihilation 
and creation operators, analogous to the lowering and raising operators for 
energy a; and a; in Section 6.3 and to the operators J; —iJ2 and J; +i J» that we 
used in working out the content of angular momentum multiplets in Section 5.4. 
Suppose a state represented by a wave function y has definite values Ey and 
Py for the total energy and total momentum. That is, 


Ay =Eyy, Pw=pyy. 
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Then 
Ao a(p)v = [Ao, a(p) |W + a(p) Hoy = (Ey — E(p)a@p)y ; 


so if a(p)y does not vanish, it is the wave function for a state with energy 
Ey — E(p). Likewise, this is a state with momentum p, — p, while a'(p)W 
is the wave function for a state with energy Ey + E(p) and total momentum 
py +p. In other words, a(p) and a’ (p) respectively annihilate and create a par- 
ticle of momentum p. This is what we mean when refer to elementary particles 
being bundles of the energy and momentum in some field. 

At this point, and for the rest of this chapter, we will abandon the language of 
wave mechanics, and instead employ the more abstract language of state vectors 
and scalar products that was outlined in Section 5.10. In quantum field theory 
the wave function of any state such as the vacuum is a complicated functional of 
the fields, and the action of operators like a(p) on these wave functions involves 
functional derivatives with respect to these fields. None of these complications 
plays a role in most calculations. What we use instead are the properties of 
operators, such as the field equations and the canonical commutation relations, 
and limited assumptions about physical states. 

In particular, it is a plausible physical assumption that there should exist a 
physical state, the vacuum Wy, with the lowest possible energy. Then a(p) Wyac 
must vanish, 


a(p)WPyac = 0, (7.2.13) 


since otherwise it would be a state with energy less by an amount E(p). 
To calculate the energy and momentum of the vacuum, it is convenient to use 
the commutator of a with a’ to rewrite Eqs. (7.2.9) and (7.2.10) as 


Ho= [dp E@) [a' pat) + Eve) (72.14) 


P= [dppa'ma), (7.2.15) 


where Eyac iS an infinite constant: 
1 


E vac 2 


1 
@peE FO= fe E [es 7.2.16 
/ p E(p)d"(0) FOnhs p E(p) x. ( ) 
From Eqs. (7.2.14) and (7.2.13), we see that this is the energy of the vacuum: 
AoW vac = Evacvac » (7.2.17) 


while Eqs. (7.2.15) and (7.2.13) show that the momentum of the vacuum is zero. 

For most purposes a constant term in the energy such as Eyac makes no 
difference, because the same constant appears in the energies of all states and 
therefore has no effect in applications of the conservation of energy. The one 
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phenomenon that is affected by such a constant is gravitation, which is coupled 
to all forms of energy. In a finite volume, Eq. (7.3.16) corresponds to an infinite 
vacuum energy density 


1 
=." _. | Py Fin). 
Pvac som | p E(p) 


But Einstein’s general theory of relativity allows a term in the field equations of 
gravitation, known as the cosmological constant, that has just the same effects 
aS Pvac. There is no reason why the cosmological constant should not include 
an infinite negative term that simply cancels Pyac, possibly leaving over a finite 
remaining energy density. Observations of an accelerated expansion of the uni- 
verse have shown that this remaining energy density is not zero, though it is tiny 
compared with the energy densities encountered in atomic and nuclear physics.! 

The quantum states of this free field can be constructed by acting on the 
vacuum with any number of creation operators. If we define 


Vp, pops... = @' (pi) a (p2) a" (p3) --- Wvac (7.2.18) 
then from Eqs. (7.2.11), (7.2.12), and (7.2.17) we see that 


Hop, ,po,p3,... = LEvac + E(p1) + E(p2) + E(p3) +--+ |] Ypy,po.p.... 
(7.2.19) 


and 


PW, p).p3,... = [P1 + P2 + p3 +--+ ]Yp,,po,ps,-- - (7.2.20) 


These are states with any number of particles. The superpositions of all 
such states make up what is called Fock space, named after Vladimir Fock 
(1898-1974). Because the operators a‘ (pi), a‘ (p2), etc. all commute with one 
another, the states (7.2.18) are symmetric in the momenta of the particles, and 
hence these spinless particles are bosons. 

The states Wp, p,.p;,... are no longer eigenstates of the Hamiltonian if we 
add higher-order terms such as g°, y+, etc. to the Lagrangian density. Such 
terms drive transitions between these states, corresponding to the creation and 
annihilation of particles. We will discuss this further in the next section. As we 
will see there, knowledge of the free-field theory is an essential ingredient in 
these calculations, which is why we have gone into it here. 


! For a textbook discussion and references to the original literature, see S. Weinberg, Cosmology (Oxford 
University Press, Oxford, 2008), Sections 1.4 and 1.5. 
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We shall now consider how to calculate transition rates in theories of interacting 
fields. Most (though not all) useful calculations in quantum field theory rely on 
perturbation theory. We write the Hamiltonian as H = Ho + H’, where Ho 
is the Hamiltonian of a free-field theory, like the Hamiltonian discussed in the 
previous section, and H’ is an interaction term that is considered to be small 
enough to allow physical quantities to be calculated as power series in H’. 

In Section 5.9 we saw how perturbation theory is used to calculate shifts in 
energy levels in quantum mechanics. In second and higher orders in a perturba- 
tion, we encounter energy denominators, such as that shown in Eq. (5.9.27). 
Similar energy denominators occur in perturbative calculations of scattering 
amplitudes using the Lippmann—Schwinger equation (5.6.29) and its iterations. 
The fact that denominators involve energy but not momentum differences makes 
it obvious that they are not Lorentz invariant, so they make it difficult to keep 
track of Lorentz invariance in relativistic theories. This sort of perturbation 
theory, which is now known as “old-fashioned perturbation theory,” was all that 
was available for calculations in quantum field theory in the 1930s, making 
progress difficult. In particular, it was not clear how to deal with the divergent 
integrals occurring in these calculations without losing the Lorentz invariance 
of the underlying theory. 

In the late 1940s, independently, Richard Feynman? (1918-1988), Julian 
Schwinger? (1918-1994), and Sin-Itiro Tomonaga (1906-1979) and his collab- 
orators* were able to carry out manifestly relativistic perturbative calculations 
in quantum electrodynamics. The equivalence of their methods was shown by 
Freeman Dyson? (1923-2020), who gave a systematic account of a method 
of calculation that would maintain manifest Lorentz invariance to all orders 
of perturbation theory. We shall now describe this method. Here and for 
the balance of this chapter, as is usual in work on quantum field theory, we 
shall use what are called “natural units,’ in which A = c = 1, and we shall 
continue to represent physical states as vectors in Hilbert space, as described in 
Section 5.10. 


2RP Feynman, Rev. Mod. Phys. 20, 367 (1948); Phys. Rev. 74, 939, 1430 (1948); ibid., 76, 749, 769 
(1949); ibid 80, 440 (1950). 

3 5. Schwinger, Phys. Rev. 74, 1439 (1948); ibid., 75, 651 (1949); ibid., 76, 790 (1949); ibid., 82, 664, 914 
(1951); ibid., 91, 713 (1953); Proc. Nat. Acad. Sci. 37, 452 (1951). 

aS: Tomonaga, Prog. Theor. Phys. Rev. Mod. Phys. 1, 27 (1946); Z. Koba, T. Tati, and S. Tomonaga, ibid., 
2, 101 (1947); S. Kanesawa and S. Tomonaga, ibid., 3, 1, 101 (1948); S. Tomonaga, Phys. Rev. 74, 224 
(1948); D. Ito, Z. Koba, and S. Tomonaga, Prog. Theor. Phys. 3, 276 (1948); Z. Koba and S. Tomonaga, 
ibid., 3, 290 (1948). 

5 BJ. Dyson, Phys. Rev. 75, 486, 1736 (1949). 
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Time-Ordered Perturbation Theory 


We saw in the appendix to Section 5.6 how the rate for any transition a > 6 
between free-particle states @ and 6 can be calculated from a knowledge of the 
S-matrix, Sgq. Our first task now is to see how to express Sgq in a form that 
allows its calculation in modern perturbation theory. 

In the appendix to Section 5.6 we showed how to construct an eigenstate Vy 
of the full Hamiltonian, with HWY, = Ey Wy, with the special property that at 
very early times it looks like the eigenstate ®, of the free-particle Hamiltonian, 
with Hp®y, = Eq Pq, in the sense of Eq. (5.6.34): for t > —ox, 


[ sei", da —> [ e@metiro, da , 


where g(a) is a smooth function of the momenta of all the particles in state a, 
introduced to give meaning to the limit for t — —oo. (Recall that the label a is 
intended to include the momenta, spin 3-components or helicities, and species 
labels of all the particles in the state aw, and an integral over q@ is intended to 
include integrals over all momenta and sums over all spin 3-components or 
helicities and species labels.) We also showed that at very late times the same 
state Vy looks like the superposition { dB Sga®p, in the sense of Eq. (3.6.35): 
fort ~ +o, 


[ rei", da > J stone it da [+ Spa Pp - 


With considerable loss of mathematical rigor, we multiply both sides by the 


operator e'’ and equate coefficients of g(a) on both sides, so these formulas 
yield 

Wy = Q(—00) Bg = Q(+00) i dB Spa , 
where 


1 wae ee ae 
From the two equalities we can conclude that 
Spa = (®g, 27! (+00)2(—00) By) = (©g, U (+00, —00) Py), (7.3.1) 
where 
U(t, to) = QU) (to) = ef Mo eH Ato) gt Holo (7.3.2) 


The justification (such as it is) for treating Q(t) as if it had well-defined limits 
for t > oo is that at very early and very late times the incoming and outgoing 
particles are so far apart that the interaction H — Ho is ineffective. As we shall 
see, at least in perturbation theory the limits t — +oo do lead to well-defined 
probability amplitudes. 
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To construct a perturbation series for U in powers of the interaction H — Ho, 
we first take the derivative of Eq. (7.3.2) with respect to f: 


“ut to) = —iexp(iHot)[H — Ho] exp(—i A(t — to)) exp(—i oto) 
= -iH'(t)U(t, to) , (7.3.3) 


where H’(ft) is the interaction in what is called the interaction picture, in which 
the time dependence is governed by the free-particle Hamiltonian Ho: 


H'(t) = exp(iHot)[H — Ho] exp(—i Hot) . (7.3.4) 


The time dependence of any operator in the interaction picture is given by its 
commutator with Ho, which is one reason why we need to understand the free- 
field theory before taking interactions into account. 

The differential (7.3.3) together with the initial condition U (fo, to) = 1 is 
incorporated in the integral equation 


t 
U(t,fo) =1-— i| H'(t))U(t), to)dty . (7.3.5) 
1 


We can solve this at least formally by iteration: 


t t ty 
uct) =1-i f dt Hn) + (iF f anf dt) H'(t}) H'(t)) +--> . 
to to to 
(7.3.6) 


Instead of using limits on the integrals to impose an ordering of the integration 
variables t,, fo, etc., we can integrate all these variables over the whole range 
from fo to t, which with n integrals includes n! permutations of the order of 
the integration variables; we then correct for this multiplicity of permutations 
by dividing by n! and reimpose the ordering of time variables by changing the 
product of H’ operators to a time-ordered product denoted T{---}, in which 
operator factors appear in order of decreasing time arguments. For instance 


t tl 1 t 
/ dt i dty H(t) H'(n) =~ / dt / dty T{H'(t) H'(o)} 
to to 2 to to 


where 


A'(t))H'(2) th >t 


T{H'(t)) H (t2)} = H'())H() t>t. 


The complete sum is then 


CO 7_sy\n et t 
V(t.) =1+ [oan f di, TE Gye. 03% 
n=1 , 10 1 
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This begins to look Lorentz invariant if we take the limits f — +00 and 
to — —oo and suppose that H’(r) is the integral over all space of a scalar 
density H’(x), such as a polynomial function of the field g(x) discussed in the 
previous section: 


H'(t) = / d°x H'(x,t) (7.3.8) 
in which case Eq. (7.3.7) becomes 
(—i)” 


n! 


i ne i dy T(H!(x1) +H nd}, 
(7.3.9) 


CO 
U(co,-00) =14+ > 
n=1 


which can be used along with Eq. (7.3.1) to calculate the S-matrix. 


Lorentz Invariance 


The remaining problem with Lorentz invariance is that the integrand is still 
time-ordered. As we saw in Section 4.7, the ordering in time of two events at 
spacetime positions x; and x2 is Lorentz invariant if the separation x; — x2 is 
time-like or light-like — that is (using units with c = 1), if 


Nuv(x1 — x2)" (01 — x2)” = (x1 — m2)? — (4 —)* <0. 


Thus to make the scattering operator Lorentz invariant we need the densities to 
commute at space-like separations: 


[H’ (x1), H’ (x2) | =0 for nuv(x1 —x2)"(x1 — x2)" > 0. (7.3.10) 


The vanishing of this commutator tells us that there is no obstacle to finding 
states that are eigenstates of both H’(x,) and H’(x2), which can also be justified 
on grounds of causality since for x; — x2 space-like no signal could travel from 
a measurement of H at x; to interfere with a measurement of H at x2. 

Any space-like separation x; — x2 can be obtained from a purely spatial 
separation with t} = f by a Lorentz transformation, so as long as H’(x) is a 
scalar, the necessary and sufficient condition for (7.3.10) is that the commutator 
should vanish at equal times: 


[H’ (x1, 1), H’ (xo, 1)] =0. (7.3.11) 


The scalar field v(x, t) introduced in the former section satisfies the commuta- 
tion relation [¢(x1, t), g(K2, t)] = 0 for any positions x; and x2, so an interaction 
Hamiltonian density H’ constructed as any polynomial function of g will satisfy 
Eq. (7.3.11). As we shall see in the next section, the condition (7.3.11) is not 
so easy to satisfy in more general theories, and this leads to the necessity of 
antiparticles. 
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Example: Scattering 


To make all this concrete, let us calculate the lowest-order amplitude for scat- 
tering of a pair of particles in the theory of real scalar fields, with Ho the 
free-particle Hamiltonian described in the previous section, and with the simple 
interaction Hamiltonian density 


H' = 293, (7.3.12) 

6 
with g a constant taken small enough to justify the use of perturbation theory.® 
(The factor 1/6 is inserted for later convenience.) To lowest order in g, the 
S-matrix element for particles with momenta p, and p> to scatter, with momenta 


changing to p}, and pj, is 


1 sg\2 
Spipipim = —3 (=) / dtx dty yy. T{O3(),G°(9)} pip.) + (7.3.13) 


where 


1 . 
Ppp, = —<a' (p1) a‘ (pr) Bo , (7.3.14) 
PiP2 /2 p p 
with ®o the free-particle vacuum state. The factor 1 /./2 is included to compen- 
sate for the sum of two delta functions in the scalar product; using Eqs. (7.2.7) 
and (7.2.13), 


1 
(Py p> Ppips) = 5 [5°(P) — P1S*(P) — p2) + 8°(Pi — p2)5°(P) — Pr] - 
(7.3.15) 


There is no term in this S-matrix element that is of first order in g, because 
there are not enough creation and annihilation operators in a single gy? operator 
to destroy the two initial particles and create the two final particles. 

Our strategy in calculating the scattering amplitude (7.3.13) will be to move 
a pair of the annihilation operators in y3 (x) and/or y3(y) past the creation oper- 
ators in ®p,p), Which gives a pair of commutators of annihilation with creation 
operators, and use the fact that annihilation operators give zero when acting 
on ®o; also, to move a pair of the creation operators in p(x) and/or y? (y) 
to the left side of the scalar product, so that their adjoints act as annihilation 
operators on dy pl? and then move them past the creation operators in Py pi? 
giving another pair of commutators of annihilation with creation operators and 
again using the fact that annihilation operators give zero when acting on ®o./ 


6 This theory is actually unphysical, because H’ + —oo if g > 0 and gy > —oo orif g < Oandg > +oo. 


This problem does not emerge in perturbation theory, and in any case can be dealt with by adding higher 
even powers of y with positive coefficients in H’. 

In moving these annihilation operators out of the time-ordered product to the right or their adjoints to 
the left, we are ignoring their commutators with the other fields in y3(x) and y3(y), because these terms 
involve momentum delta functions that vanish if we assume that neither p{ nor p4, equal either p; or po. 
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If we separate the annihilation and creation parts of g(x), so that 


9(X) = Gan(x) + ox) , (7.3.16) 
(x n= __ ap (p) ex (i( -x— E( )t)) (7.3.17) 
Pan (X, 0) = JIE) Ons" P) exp (1(p p , 3. 
then 
1 
+ = ip-x 

[Yan(x), a'(p)] = SBE Ons” ; (7.3.18) 

where 


p-X=Nyvp'x” =p-x— E(p)t. 


Following this strategy, we encounter three terms in the S-matrix element: 


a. 


The annihilation operator that destroys the particle with momentum p; 
and the creation operator that creates the particle with momentum p’, come 
from the same g? operator in Eq. (7.3.13), while the annihilation operator 
that destroys the particle with momentum pz and the creation operator that 
creates the particle with momentum p, come from the other gy? operator. 


. The annihilation operator that destroys the particle with momentum p; and 


the creation operator that creates the particle with momentum p‘, come 
from the same g° operator in Eq. (7.3.13), while the annihilation operator 
that destroys the particle with momentum pz and the creation operator that 
creates the particle with momentum p‘ come from the other gy? operator. 


. The annihilation operators that destroy both initial particles come from the 


same y? operator, while the creation operators that create both final particles 
come from the other gy? operator. 


In each case, one of the g(x) and one of the g(y) fields is left over in the 
time-ordered product. Also, since the time-ordered product in Eq. (7.3.13) is 
symmetric in the spacetime arguments x and y, each of the above contributions 
is a sum of two equal terms with x and y interchanged, so we can make an 
arbitrary choice of which of the y* operators in the three cases above is gy? (x) 
and which is gy), and drop the factor 1/2 in (7.3.13). Instead, a factor 1/2 
appears owing to the factors 1//2 given by Eq. (7.3.14) in the initial and final 
states. The factor 1/6 in Eq. (7.3.12) is cancelled by the 3! ways of choosing 
each of the fields in y?. 


Here in turn are these three contributions: 


2 
a rs t 
Svppips = / d*x d*y g(lpan(x).4" P)I[Pan(y). 4" (P4)]o 


T {y(x), 9()} X [Pan(x), a" (Pi) IfGan(y), a" (p2)] Pog) 
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2(2)®/2E (pi) « 2E (po) - 2E(p1) + 2E(P) 


x / d*x d+ ye iP * e~'P2'Y oP! * e'P2'V (oy, T{o(x), (y)} 0) - 


(7.3.19) 
2 
& / Tr! 
Sy ppips = / d'x d*y g({¢an(y), 4" (P{)IYan(x), 4" (P5)1¥o 5 
T {y(x), 9(v)} X [Gan(x), a" (pi) IfGan(y), a" (P2) 1 Pog) 
2 
—&§ 


2(27r)8,/2(p1) - 2E(p2) - 2E(p}) - 2E 4) 


x fas d4 ye Pi Ye !P2* giP1* elP2') (hy, T{9(x), p(y) } Bo) - 


(7.3.20) 
2 
SO gaps =e ft ty g(t 9).44 (PI an(0).4(P5)100 
T {p(x), o(x)} X [Pan(x), 2" (P1) I Gan(y), a" (p2)] Pog) 
2 
—§ 


2(270)%\/2 (p1) - 2E(p2) - 2E(p}) - 2E 4) 


x [ats dt ye PV e~iP2Y giP1* g!P2* (hy, T {(x),0(y)} Bo) - 
(7.3.21) 


These three contributions are symbolized in three of what are known as Feyn- 
man diagrams, shown here in Figure 7.1. 


Calculation of the Propagator 


Evidently, we need to calculate the vacuum expectation value 


(0, T{p(x), g(y)}®o), 


which is known as the propagator of the field @. For this purpose, we again 
write the scalar field as in Eq. (7.3.16) and use the fact that @an acting to the 
right on ®p and in acting to the left, where its adjoint acts on ®o as @an, both 
vanish. This gives 
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2 2’ 


(c) 


Figure 7.1 Feynman diagrams for the scattering of neutral scalar particles. 
Here the lines coming into the diagrams from below or going out from the 
diagrams above represent particles in initial and final states, respectively; the 
vertices represent an interaction and are proportional to y*; the line connecting 
vertices represents the propagator. 


(Do, (x) G(y) Bo) = (Po, Gan (x) G;,(y) Po) 
= (Bo, [Pan(x), Gi n(y) Po) = A+(@—y) (7.3.22) 


where A. is the function 


dp 0 
A = | ———_ ip-z—iE : 7.3.23 
1ao= f Sy PP 2 FE)" (7.3.23) 
The propagator is then 


(Do, T{e(x), p(y) } Po) = OO — y)AL — y) + O(y — x)AL(Y — x), 
(7.3.24) 


where @ is the step function 
1 29>0 
0 2 20. 


What we need in Eqs. (7.3.19)-(7.3.21) is the Fourier transform of the 
propagator: 


a= { (7.3.25) 


A@) = / e!47 [6A (@) +. O(—z)A¥(—2)] a*z 
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1 


_ on 0 - 0 _; 0 
= saat dz exp[(iq  —iE(q))z'] 


0 
+ / dz expi(ig? + 1B @)° 


We can give meaning to these integrals by inserting convergence factors 
exp(—ez°) in the first integral and exp(+ez”) in the second integral, where € 
is a positive infinitesimal. The integrals are then elementary: 


| | 
A@) = F@ | igi eabe).” 2hage =a 
so, fore > 0, 
=i 
q? +m? — 2icE(p)’ 
0)2 


A(q) > (7.3.26) 
where q* = Nuvg’q? = q? — (q°)?. The term —2ie E(p) in the denominator, 
though infinitesimal, is important in more complicated calculations where we 
have to integrate A(q) over arange of its argument in which g?+m? can vanish. 
(For this purpose, it is only important that it is a negative imaginary infinites- 
imal, and so is usually written simply as —ie.) In our calculation the integrals 
over x and y fix the argument of A, and we can drop the term —2ie E(p) in the 
denominator.® 

To do the integrals over x and y in Eqs. (7.3.19)-(7.3.21) we set x = (x — 
y) + y and integrate separately over x — y and y. In each term the integral over 
y then simply gives a factor (27)*5+( P+ Ps — Pi — p2), which guarantees the 
conservation of energy and momentum. The integrals over x — y are given by 
Eq. (7.3.26), and the sum of (7.3.19), (7.3.20), and (7.3.21) then gives the total 
second-order scattering amplitude: 


ig?(21)*8*(p', + py — pi — p2) 


Sp’ p,.pip2 = P . ; 
2(27) /2E@) -2E(p2) - 2E(p)) -2E(p,) 
1 : 1 % 1 
x ‘ 
(pi — pi)? +m (pi — ph)? +m (pi + pr)? +m? 
(7.3.27) 
8 


The circumstance that all four-momenta are fixed by the delta functions generated by integrals over 
spacetime coordinates is true of all tree diagrams — that is, diagrams like Figure 7.1 that can be 
disconnected by cutting any single internal line. The contributions to the S-matrix of diagrams with L loops, 
whose disconnection requires the cutting of a minimum of L + | internal lines, involve integrals over L 
four-momenta. 
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The appearance here of the term 1/[(p1 — Pp)? + m7] may evoke the recol- 
lection of an earlier result. In Eq. (5.6.23) we found in the Born approximation 
that a potential proportional to exp(—«r)/r gives a scattering amplitude pro- 
portional to 1/[(k — k’)? + x], where k and k’ are the initial and final wave 
numbers of the scattered particle, or, in units with i = 1, the initial and final 
momenta. There is no energy term in the denominator in Eq. (5.6.23) because 
the scattering was supposed there to be due to an external potential that can 
transfer momentum but not energy. Aside from that, the comparison shows 
that the exchange of a scalar particle of mass m creates effects, like those of 
a Yukawa potential, proportional to exp(—«r)/r, with k = m in natural units 
or, in cgs units, with « = mc/h. This was the point made by Yukawa? in 1935, 
which led him to the prediction of a “meson,” with mass intermediate between 
the electron and the proton, to carry the nuclear force. 


7.4 Antiparticles, Spin, Statistics 


The real scalar field discussed in the previous two sections could not describe a 
particle that carries any conserved quantity, such as electric charge. If the anni- 
hilation part @an(x) of the field given by Eq. (7.3.17) destroys a certain amount 
of charge then its adjoint Qin (x) would create the same quantity of charge, and 
no interaction such as gy constructed from the real field g = Yan + gin could 
possibly conserve this quantity. We could construct interactions that conserve 
charge by separating @a, and Qin and taking the interaction to include equal 
numbers of factors of each, such as 02, Vin , but then we would not be able 


to preserve Lorentz invariance. The commutator of @an(x) and Qin(y) is the 
function A(x — y) given by Eq. (7.3.23), which does not vanish for x — y 
space-like, so an interaction such as y2,phe that treats @gy, and Qin separately 
would not satisfy the condition (7.3.10), which we have seen is necessary for 
Lorentz invariance. 

So, what to do? The only known way of restoring Lorentz invariance for 
charged particles while preserving charge conservation is to take the free field 
to be complex, the sum of a term that annihilates a particle and another term 
that creates its antiparticle, a particle with the opposite value of electric charge 
(and of all other conserved quantities) but the same spin and mass. For spinless 
particles this field takes the form 


9 HL Yukawa, Proc. Phys.-Math. Soc. Japan 17, 48 (1935). This article is reprinted in Beyer, Foundations of 
Nuclear Physics, listed in the bibliography. 
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d3p 
g(x, t) = i TEE@ (aye exp(ip -x — iE(p)t) 
+ b'(p)exp(—ip-x+iE(p)t)] (7.4.1) 


where 
[a(p), a" (p’)] = [b(p), b'(p)] = Sp p’), (7.4.2) 
[a(p), a(p’)] = [b(p), b(’)] = La(p), b@)] = 0, (7.4.3) 
[a’ (p), b(p’)] = [b'(p). a(p’)] = 0. (7.44) 


In particular, the commutator of y(x,t) with g"(y,t) vanishes for space-like 
x — y because of the same sort of cancellation that we encountered for real scalar 
fields. Both terms in g change the electric charge (or any other conserved quan- 
tity) by the same amount, so interactions conserve this charge if they contain 
an equal number of factors of g and gy". This theory was presented in 1934 by 
Pauli and Weisskopf,!° in order to contradict Dirac’s view that antiparticles arise 
as holes in a sea of negative-energy particles. Antiparticles are indispensable 
for Lorentz invariance in any quantum field theory of particles that carry a 
conserved quantity, such as electric charge, even if the particles are bosons, 
which do not satisfy the Pauli exclusion principle and so could not form a stable 
sea of negative-energy particles. Where particles carry no conserved quantity, as 
in the previous sections, these particles can be said to be their own antiparticles. 
This is the case for the neutral pion and for the Z° particle. 

This sort of free complex field theory can be derived as a consequence of a 
free-field Lagrangian density, of the form 


dv’ dy 2% 
L = —7t? —— — 'Q. TAS 
ag ee (7.4.5) 
The Euler-Lagrange field equations are again 
(O—m’)go =0 (7.4.6) 


whose general complex solution is Eq. (7.4.1), with spacetime-independent op- 
erator coefficients a and b’. But now the canonical conjugate to ¢ is the time 
derivative of an independent canonical variable, the adjoint: 


m(X,t) = “ot ; (7.4.7) 


while 0~/dt is the canonical conjugate to g'. The canonical commutation rela- 
tions (7.1.10) then yield the commutation relations (7.4.2)-(7.4.4). 


10 W. Pauli and V. F. Weisskopf, Helv. Phys. Acta 7, 709 (1934). 
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The particles described so far are bosons. The multi-particle state vectors 
here are 


a’ (p1)a"(p2)a" (p3) --- b' (p)b" (ph)! (ps) --- Wo . (7.4.8) 


where Wo is the vacuum, satisfying a(p)Yo = b(p)Yo = 0. By taking the 
adjoints of the commutation relations (7.4.3), we see that these particles are 
bosons; the states (7.4.8) are completely symmetric under interchanges of the 
labels pj, p2, etc. of the particles and under interchanges of the labels p', p, 
etc. of the antiparticles. 

Suppose we wanted to construct a theory of spinless neutral fermions. We 
could suppose that all the commutators [A,B] = [A,B]. = AB-— BA 
that we previously derived from the canonical commutation relations are now 
replaced with anticommutators, [A, B]4 = AB-+ BA. For instance, we can try 
introducing a real scalar field like (7.3.16): 


p(x, t) = Pan (X, t) =F gi, (x, t) ’ 
wn= far (ip -x —1E(p)t) a(p) 
Pan (X, = (2x)3/2,/2E(p) expUp:-x l p a(p 


but we now suppose that the annihilation and creation operators satisfy the 
anticommutation rules: 


[a(p), a" (p')]4 = 8 (p—p’), (7.4.9) 
[a(p), a(p')]4. = [a"(p), a" (p')], = 0. (7.4.10) 


The anticommutation relations (7.4.10) imply the complete antisymmetry of the 
multi-particle state vector 


a‘ (pi)a' (p2)a‘(p3) --- Yo (7.4.11) 


under interchange of the labels pj, po, etc. of the particles, as required for 
fermions. 

In place of the vanishing of the equal-time commutator [g(x, f), g(y, t)] we 
now have 


[p(x 1), oy, 1+ = [an(x, 1), OL (y. O14 + [ein (% 1), Gan (y, O14 
_ Ai (x _ y, 0) Se A+(y — Xx, 0) 


where A(x — y, x? _ ,) is the function (7.3.23), which at equal times is 
non-zero and even: 


3 
A(x —y,0) = } _ FP iptx-y) _ _m_ Ki(m|x—yl) 
2E(p) (27) 4x2 |x —y| 


We see that here the two terms in [g(x,?t), p(y, t)]4 do not cancel but add, 
unlike the bosonic case. It is in fact impossible to construct scalar fields that 
anticommute at equal times for spinless fermions. 
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Though it is not possible here to go into so much detail, the same sort of 
analysis leads to the general conclusion cited in Section 5.5, that integer-spin 
particles (including spinless particles) must be bosons, while particles with half- 
odd-integer spin must be fermions. The free fields for spinning particles take the 
general form 


On (X,t) = Pn an (X, t)+ Pn,cr(X, t), (7.4.12) 


a? p 
enantt = | ar paEegin PEPER x FEIN a(P.2), 
(7.4.13) 


= dp ; : + 
Pn,cr(X, t) = / Ons /sE@y exp(—ip-x+/E(p)t) b'(p,o). 
(7.4.14) 


Here a(p,o) is the operator that annihilates a particle of momentum p and spin 
3-component o; b'(p,c) is the operator that creates its antiparticle (so that 
both terms in g have the same effect on the charge and all other conserved 
quantities); and u,(p,o) and v,,(p, 0) are functions about which more later. For 
neutral particles that are their own antiparticles, a(p,a) = b(p,o). For bosons 
or fermions, the operators a and b satisfy the commutation or anticommutation 
relations 


[a(p,o),a"(p',o’)lz = [b(p,o), b' (po Vz = S50 53 (p — p’) 


(7.4.15) 
[a(p,o), a(p’,o")]= = [b(p, o), b(p’,o’)]= = [a(p, a), b(p’, o’) |< = 0 ’ 

(7.4.16) 

[a' (p,c), b(p’,o’)|< = [b' (p.o), a(p’,o)]- = 0, (7.4.17) 


the + signs being minus, denoting commutators, for bosons or plus, denoting 
anticommutators, for fermions. 

The functions u,(p,o) and v,(p, 0) are governed by what is assumed for the 
Lorentz-transformation property of the fields. Under a Lorentz transformation 
x! —> x’ = A¥,x, the various fields g,(x) undergo various matrix transfor- 
mations!! 


On(x) > OP (AX) = Y> Dam(A)Gm (x) » (7.4.18) 


11 These transformations are often written as actions of a quantum-mechanical operator U(A), as 
U(A)gn(x)U!(A) = ©, Dap (A) Gm (Ax). This is the same as Eq. (7.4.18) if we identify (Ax) = 
U7! (A) pn (x)U(A). 
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so that for an observer who uses coordinates x“ = A“,.x” the field is related 
by a matrix D to the field for an observer who uses coordinates x“ at the same 
spacetime point, a point that is thus given different coordinates by the two 
observers. When we perform two successive Lorentz transformations A; and 
then Ao, the effect on the fields is 


n(x) > GP(A1x) = D> Da (Arr) 
l 


> So Du (Argp?(A2x) = S> Dat(A1) Dim(A2)@m() 5 
1 ml 


while the effect of the compound Lorentz transformation (A; A2)", = AG ps v 
is 
AiA2 = 
Qn (x) => Py (A; A2x) _ S © Dam(A1A2)¢m(x) : 
m 


These transformations must be the same, so 
D(A1)D(A2) = D(AiA2) , (7.4.19) 
where [D(A1)D(A2)]nm is the usual matrix product: 


[D(A1)D(Aa) nm = Y> Dni(A1) Dim(A2) - 
1 


Such matrices are said to form a representation of the group of Lorentz transfor- 
mations. We classify the various kinds of field according to the representation 
they furnish of the Lorentz group. 

It is always possible to write the Lagrangian density in terms of fields that 
are irreducible, in the sense that their components cannot be divided into sets 
that, under Lorentz (and perhaps space inversion) transformations, transform 
only into linear combinations of the field components in the same set. Among 
these irreducible fields are a single scalar field for spin zero, for which D(A) 
is the unit matrix, or a single four-vector field for spin one, for which D(A) is 
A itself. For spin 1/2 there is the four-component Dirac field, briefly described 
in the appendix to this section. For our present purposes, the important thing 
about irreducible fields is that the coefficient functions u,(p,o) and v,(p,o) 
are uniquely determined up to constant factors by what is assumed for the 
Lorentz-transformation properties of the fields and the spin of the particles. 

As discussed in the previous section, for the Lorentz invariance of the theory 
it is not enough that the interaction Hamiltonian density H’ should be a scalar; it 
also has to satisfy the condition (7.3.11), that H’(x) should commute with H’(y) 
at equal times x° = y®. For this, it is necessary that H’ should be formed from 
bosonic fields that all commute with each other at equal times, plus some even 
number (perhaps zero) of fermionic fields that anticommute with each other at 
equal times (and commute with the bosonic fields at equal times). 
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For particles that are not their own antiparticles, the commutators or anticom- 
mutators of the g, with each other and of the gy}, with each other trivially vanish. 
On the other hand, the equal-time commutator or anticommutator of any field 
with its adjoint is 


[on (x,t), oF, Cy, Ol¢ = Anm(X — y) = Anm(y — x) (7.4.20) 


where 


- d*p : ip-(x-y) 
Anm(X — y) = dX 7 2E(p)Qnye.” (p,o)u,,(p.o)e ; (7.4.21) 


13 : . 
A x — = y ! ip-(x—y) 74.22 
nm y) = / 2E( (2 572 Un (P.O )Un (P, oe : ( wee ) 


The first and second terms on the right of Eq. (7.4.20) come respectively 
from the commutator or anticommutator of the annihilation part of @, (x,t) 
with the creation part of Qh (y,t) and from the commutator or anticommutator 
of the creation part of @g,(x,f) with the annihilation part of Qh, (y,t). (The 
crucial = sign that distinguishes bosons from fermions appears in the second 
term of Eq. (7.4.20) because this term comes from the part of the commutator 
or anticommutator of @, with Qh in which b’ appears to the left of b.) Detailed 
calculations beyond the scope of this book show that! 


Anm(y — x) = (—1)"|A[?Anm (x — y) (7.4.23) 


where j is the particle spin and 4 depends on how the wu, and v, are normalized. 
(If we multiply u, and v, by factors w and £ then A is changed by a factor B/a.) 
For equal-time commutators or anticommutators of fields and their adjoints to 
vanish, the two terms in Eq. (7.4.20) must cancel. For this we need 


|al?(—1)77 = +1, 


with the top sign for bosons and the bottom sign for fermions. This requires that 
|A| = 1, which can always be arranged by adjusting the relative normalization of 
Uy, and v,, and thereby imposes a relation between the strengths of interactions 
of particles and antiparticles. But with |A| = 1, we also need 


(-1)7 =+1. (7.4.24) 


This is the famous connection between spin and statistics:!3 particles with j an 
integer are bosons, and particles with j a half odd integer are fermions. 


12 For a textbook treatment, see e.g. S. Weinberg, The Quantum Theory of Fields, Vol. 1 (Cambridge 
University Press, Cambridge, UK, 1995), Section 5.7. 
3 M. Fierz, Helv. Phys. Acta 12, 3 (1939); W. Pauli, Phys. Rev. 58, 716 (1940). 
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In 1928 Dirac introduced a relativistic wave equation! that he thought would 
provide the basis for a formulation of quantum mechanics consistent with spe- 
cial relativity. About this program he was wrong; the successful relativistic 
formulation of quantum mechanics turned out to take the form of quantum field 
theory. But his equation survives as the field equation of the quantum fields for 
particles of spin 1/2, and their antiparticles, and leads to some of the same 
consequences, such as formulas for the fine structure of atomic spectra. This 
appendix provides just a sketch of Dirac’s formalism, skipping most proofs. 

The Dirac field is a set of four operators y, (x), characterized by their Lorentz 
transformations: for x > Ax, 


Vn(x) > We (Ax) = D5 Dam(A)Yim() . (7.4.25) 


with the matrix D(A) furnishing a representation of the Lorentz group with the 
special property that 
D7'(A)y# D(A) = A“ yy” (7.4.26) 
where the y” are a set of four 4 x 4 matrices satisfying the anticommutation 
relations 
+1 w=v=1,2,3 


yey +ty’yp* =2n¥=2x% -1 w=v=0 (7.4.27) 
O wsAv. 
This allows a Lorentz-invariant first-order free-field equation for mass m: 
0 
(Care =F m) w(x) =0. (7.4.28) 
axl 


Using the commutativity of partial derivatives and the anticommutation rules 
(7.4.27), we see that Eq. (7.4.28) has the consequence 


0 a 
= orn wo = — m2 
o=(y Axh m)(y aa tm) v= m)w. 
For this reason, Dirac thought of his equation as a sort of square root of the rela- 
tivistic Schrodinger (or Klein—Gordon) free-particle equation (LI — m)w = (0), 
The general solution of Eq. (7.4.28) is 


e!? un (p,o)a(p,o) +e? * Un (p, ob! (po) | 
(7.4.29) 


ae 
me | Onl, /2E@) 


14 P. A. M. Dirac, Proc. Roy. Soc. (London) A117, 610 (1928). 
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where p+ xX = Nuvp"x"; p? = E(p) = +(p? + m*)'/?; the un(p,o) and 
Un (p, o) are independent solutions of the equations 

(inuvy"p” +m)u=0, (7.4.30) 
(—inuwy"p” +m)v=0; (7.4.31) 


and a(p,o) and b(p,o) are operator coefficients, with o labeling the indepen- 
dent solutions of Eqs. (7.4.30) and (7.4.31). 
We can count the number of independent solutions, noting that any column 
Wn can be decomposed as 
w=uwitu_, (inwy"p’ +m) ws =0, 


by taking 


(Finuwy" p” +m) 
W+ = w 
2m 

Thus, with a total of four components, there must be just two independent 
Un(p,o) satisfying Eq. (7.4.30) and two independent v,(p,o) satisfying 
Eq. (7.4.31). The index o therefore takes just two values, corresponding to 
the two values of the third component of spin for a particle of spin 1/2. Dirac 
thought that the solutions e~’?*v,(p,a) were the wave functions for a free 
negatively charged electron with negative energy; instead, just as we saw for 
scalar fields, they are the coefficients of the creation operator b’ for a positively 
charged antielectron, or positron, of positive energy. 

In forming a Lagrangian density L(x), we need to include both fields and 
their adjoints in such a way that £(x) is a scalar. Here A Vi Wn is not a 
scalar, but here and more generally there is always a matrix By, for which 
ee wi BnmWm does transform as a scalar. This is because for any matrices 
A and B, we have (AB)'~! = (B'A')~! = At—!B'!, so the inverse of the 
adjoint Dm = D*.,, satisfies the same multiplication rule (7.4.19) as D itself: 


D'~!(A,)D™~!(A2) = D'! (Ay Ad) . (7.4.32) 


This does not mean that D‘~!(A) = D(A), but for irreducible representations 
they are equal up to a similarity transformation; there is a matrix 6 for which 


D''(A) = BD(A)B™, 
or, multiplying on the left with D‘ and on the right with B, 
Di (A)BD(A) = B . (7.4.33) 


It follows then that we can define a covariant adjoint 


Vn) = >) WiO)Bnm - (7.4.34) 
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such that the effect of a Lorentz transformation x — Ax is 
Vinx) > WA (Ax) = S> vi OLD AB lam = Y 2 FLD" Anam - 
(7.4.35) 


Thus not only is W(x) (x) = >; Vin (x) Wm (x) a scalar, but also (in the same 
abbreviated notation) y(x)y“w(x) is a four-vector. 

It is now easy to construct a Lorentz-invariant free-field Lagrangian density 
from which follows the field equation (7.4.28): 


0 
=- ("= FoF m) wv. (7.4.36) 
Without going into details, the canonical anticommutation relations here give 


[a(p.0),a° (po 14 = [bp,0),b' (po) = 8o0'5°(P—p') (7.4.37) 
[a(p, 7), a(p’, o')]+ = [b(p, 0), b(p', 01+ 


= [a(p,c), b(p’ »o ‘4 = la(p,o), bi (p',o')]4 = ’ 
(7.4.38) 


provided the solutions of Eqs. (7.4.30) and (7.4.31) are normalized so that 


> Un(P, O)Um(p, 7) = [=tp" Yu Mnm » 
° (7.4.39) 
>» Un(P, 7)Um(p, 7) = [—ip Vp —Mnm - 


(As usual, [A, B]+ is defined as AB + BA.) The anticommutator of the Dirac 
field with its adjoint is given by 


[Wn(x), VW, Oe = | —-y" ae +m [teal a e iP &-y)] 
7 , ts ’ ax e nm 2p°(2x) 
(7.4.40) 


which obviously vanishes for x° = y° and hence for all space-like x — y, as 
required by Lorentz invariance. 

We can include the interaction of the Dirac field of the electron with the 
electromagnetic vector potential using the prescription given in the follow- 
ing section. Replacing dw/dx" in the free-field Lagrangian density with 
dy/dx" +ieA,w gives the Lagrangian for electrons and positrons and their 
interaction with electromagnetism: 


— 0 
Lpirac = —V ("| + ieA, | + me) Ww. (7.4.41) 
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This yields the Euler-Lagrange field equation for a Dirac field interacting with 
any electromagnetic field: 


Ox 


(v"| A ied, | me) ¥ <0 (7.4.42) 
LL e : 4. 


The Dirac wave function used in Dirac’s calculations was not the quantum 
field w, but its matrix elements: 


Wan (x) = (Pyac, Vn(@)Pa),  WBn(X) = (OB, Wn (*) Pvac) (7.4.43) 


where ®, and ®z are states of charge —e and +e, respectively, such as states of 
an electron and a positron in the electromagnetic field of an atom. These wave 
functions satisfy the same equation as the field; 


(v"| Oo ties +m ) vacd=(r"| +ieA | +m ) va) =0 
Ox ei e ox im ; , 
(7.4.44) 


For a time-independent electromagnetic field, the time dependence of the Dirac 
field is governed by a time-independent Hamiltonian H in the Heisenberg pic- 
ture, so, for states ® 4 and ®g with energy E', and Eg, the wave functions have 
the time dependences 


Wan(x,t) «et FA Wen(x,t) x ett Eat (7.4.45) 


The different sign of the argument of et+'“2" does not arise because the state 
®z has negative energy, but because it appears to the left of the Dirac field in 
the definition (7.4.43) of the wave function Wg, (x,t). From solutions of the 
wave (7.4.44) for W4n(x, t) with time dependence given by (7.4.45) and a pure 
Coulomb field A° = Ze?/r, A = 0, Dirac was able to calculate the energies of 
the states of hydrogenic atoms, including their fine structure: 


Hepes ee (7.4.46) 
ai mn nt \8 2741 = 


with no dependence on £. 

As discussed in Section 6.5, Fermi in his 1934 theory of beta decay proposed 
an interaction Hamiltonian of the form (6.5.1), proportional to the scalar product 
of two vector currents. This then had to be modified, first by the introduction 
of axial vector currents and then by including terms that violate invariance 
under space inversion, resulting in an interaction of the form (6.5.4). Expressed 
explicitly in terms of Dirac fields for the proton, neutron, electron, and neutrino, 
Fermi’s original proposed interaction (in units with i = c = 1) was 


Hp = Grvey wih py tn) + Gry ve) bny' vp), — (7.4.47) 
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and after 30 years of experiments on nuclear beta decay and other weak inter- 
action processes, this was finally modified to 
Gr 


wv oved : vived i 
ya ver A+ ys)W)(v py" CU + ¥5) Wn) 


Gro — 
— 1 Pal Gee a Ol : 7.4.48 
+ Ta vey + y5)We(Yny' U1 + ys)vp) ( ) 


where Gr = 1.16 x 10~> GeV~? and V5 = iyiy2y3yo. It can be shown from 
the anticommutation relations that y5 is Lorentz invariant, in the sense of com- 
muting with D(A), so (7.4.48) like (7.4.47) transforms as a scalar under any 
proper Lorentz transformation. It is the presence of the matrix 1 + ys in Hg 
that produces the violations of invariance under space inversion discussed in 
Section 6.5, including the fact that if neutrinos were massless, the neutrinos cre- 
ated along with electrons by the first term in Eq. (7.4.48) or along with positrons 
in the second term in Eq. (7.7.48) would have a component of angular momen- 
tum in the direction of motion respectively equal to A/2 or —f/2. For the very 
small known masses of neutrinos, these helicities are overwhelmingly likely. 


Hp = 


7.5 Quantum Theory of Electromagnetism 


We end our treatment of quantum mechanics where we began, with the quantum 
theory of radiation. We will first present the Lagrangian densities both for the 
free electromagnetic field and for the fields’ interactions with matter, then work 
out in detail the theory of the free field, which as shown in Section 7.3 is needed 
to provide the interaction in the interaction picture in perturbation theory and 
then to apply what we have learned to a classic problem, calculation of the 
rate of emission of photons in transitions between atomic or molecular states. 
We close with an account of the interaction of electromagnetism with general 
matter fields. 


Lagrangian Density 


It is easy to think of a possible Lagrangian density for the electromagnetic 
field that is quadratic in the fields, like all free-field Lagrangians and is Lorentz 
invariant: 


1 
Lo = Teg Muemvo FU RM 3 (7.5.1) 


where F“” is the field strength tensor, given by Eqs. (4.6.7) and (4.6.8): 
E,; =F" =-F", Ey= FP =-F*, £3 = FS =-—F*?, (7.5.2) 
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B}=FR=-F”?, B= FX} =-F?, B= FV? =-F?!. (75.3) 


(The factor —1/16z77 is irrelevant now but will be convenient later, when we con- 
sider the coupling of these fields to matter.) This is manifestly Lorentz invariant, 
but otherwise appears absurd. If we assume that [ d*x L is stationary under 
arbitrary infinitesimal variations of the fields F“’, we find Euler-Lagrange 
equations of the form F/“” = 0, which certainly do not describe actual free 
electromagnetic fields. The error made in deriving this wrong result is that 
we must not impose conditions for arbitrary variations of F“”, because the field- 
strength tensor is constrained by the homogeneous Maxwell equations (4.6.15), 
(4.6.17): 
OFuv , OFap | OFva 


0O= 7.5.4 
ax’ . ax” = ox ( ) 


where 


Fie =Honiak . (7.5.5) 


We should only demand that the action is stationary for variations in the fields 
that preserve the constraint (7.5.4). 

It is easy to see that this requirement leads to the remaining free-field 
Maxwell equations dF” /dx” = 0, but in deriving the canonical commutation 
relations it is awkward to work with functional derivatives with respect to 
constrained fields like F“”. In electrodynamics it is much easier to express the 
field-strength tensor in terms of an unconstrained vector potential A,,, in such a 
way that the constraint (7.5.4) is automatically respected, 

a= cas mess (7.5.6) 
Oxk ax 
and take all functional derivatives with respect to the A,. As shown in 
Section 5.8, the introduction of a vector potential is essential anyway in 
formulating the quantum theory of charged particles in an electromagnetic 
field. 

For the present we will introduce a general Lagrangian density Cmat for 
matter and its interaction with the electromagnetic field, and define the electric 
current four-vector J“ as the functional derivative with respect to A,,(x) of the 
corresponding term in the action: 


5 
5A, (x) 


J¢(x)= / dy Lali) (7.5.7) 


Under an infinitesimal shift in A,,, the change in the total action is now 


5 ; d*x(Lo + Lmat) = / d‘x | + m¢x)| 5A, (x) , 
4m Ox? 
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and the Euler-Lagrange equations here are 
ary _ 
Oxv 


which we recognize as the inhomogeneous Maxwell equations (4.6.9) (except 
that there is no factor 1/c, because we are using natural units with i = c = 1). 


4n J’ (x), (7.5.8) 


Gauge Transformations 


Now we have a problem. We cannot satisfy the canonical commutation relations 
for the field A°, because since Foo = 0 the Lagrangian density does not contain 
a time derivative of Ag. To deal with this, we note that the action is invariant 
under a gauge transformation 


0& (x) 
ox 


with €(x) an arbitrary function of the spacetime coordinate. This has no effect 
on the field-strength tensor (7.5.6), and the consistency of the Maxwell equa- 
tions requires that the current J“ is conserved in the sense that 0J“(x)/dx" = 
0, so that according to Eq. (7.5.7) the change produced in the matter action by 
the gauge transformation (7.5.9) is 


3 ayl 
a f ats Cnn = fats J¥ (x) ew) 2 ~ f ax a) E(x) =0. 
(7.5.10) 


Ay(x) > Ay(x) + (7.5.9) 


Coulomb Gauge 
We can always choose &(x) so as to adopt what is known as the Coulomb gauge, 
for which 


v-A=0 (7.5.11) 


because if V - A 4 0, we can make it vanish by performing a gauge transforma- 
tion with V7& = —V - A. This is called the Coulomb gauge because the 1 = 0 
component of the inhomogeneous Maxwell equations (7.5.8) is here 


QF! 
Xx 


with solution given by the familiar Coulomb field 
Jy, t 
A(x, t) = fe WD (7.5.12) 
Ix—y| 


Since A® is a functional of the matter fields in J° at the same time, it is not to 
be regarded as an independent canonical variable. The canonical variables of 
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electrodynamics in Coulomb gauge are the spatial components A’, but subject 
to the constraint (7.5.11). 

The condition (7.5.11) for Coulomb gauge is obviously not Lorentz invari- 
ant. Given a vector potential A“ (x) that satisfies this condition, the Lorentz- 
transformed vector potential A”,,A” will in general not satisfy Eq. (7.5.11) if A 
is anything but a pure rotation. However, we can always combine any Lorentz 
transformation with a gauge transformation that takes the vector potential back 
to Coulomb gauge. Since the action is presumed to be gauge invariant, the 
physical consequences of the theory calculated in Coulomb gauge turn out to 
be Lorentz invariant. 

The virtue of Coulomb gauge, which here makes up for its lack of mani- 
fest Lorentz invariance, is that it displays the physical degrees of freedom of 
electrodynamics. Even though A” has four components, as we have seen in 
Coulomb gauge A°® is a functional of matter fields, and V - A vanishes. We shall 
see that the two remaining degrees of freedom are the two independent states of 
photon polarization. It must be admitted, however, that, as a practical matter, in 
carrying out calculations in quantum electrodynamics more complicated than 
those essayed here, it is necessary to use techniques that preserve manifest 
Lorentz invariance, such as the path integral approach of Feynman.!° 

Now we have to consider what is the canonical conjugate to A. According 
to the usual definition of a functional derivative, if we make an infinitesimal 
variation A > A + 5A, then 


dL(t) = [ex sdics t) 
SA 


but, since A; is constrained by (7.5.11), we are only allowed to consider varia- 
tions satisfying 05 A;/dx' = 0, so dL(t)/5 Aj (x, t) 1s only defined up to gradient 
terms, of the form df/dx’. A direct calculation gives 


agéo1f,. aa° 
— = — | Aj + — 

0A i An ox! 
but we need to take advantage of our freedom to shift this functional derivative 
by the gradient —(9A°/d.x') /4zr, and take the canonical conjugate to A; as 


mj = Aj/4n , (7.5.13) 


so that 7; satisfies the same constraint as A;: 


V-1=0. (7.5.14) 


ISR p Feynman, Ph.D. thesis, The Principle of Least Action in Quantum Mechanics (Princeton University, 
1942; University Microfilms Publication No. 2948, Ann Arbor). 
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The usual canonical commutation relations must here be modified to take ac- 
count of the conditions (7.5.11) and (7.5.14). We use the formula 
> | 
Ix—y| 
(This can be derived by showing directly that the left-hand side vanishes for 
x # y, and using Gauss’s theorem to show that its integral over all space is 


—4z.) Then we have consistency with conditions (7.5.11) and (7.5.14) if we 
take 


= —475°(x—y). 


a2 1 
A; (x,t), 7; (y,t)] = i8;;8°(x — j —___ 7515 
Late Esme Ys N= tea aE erat =) 


and also 


[Ai (x,t), Aj(y,t)] = [vi (x,t), wi(y,t)] =0. (7.5.16) 


Free Fields 


As emphasized in Section 7.3, the first step in using time-ordered perturbation 
theory to calculate processes involving interacting particles is to write explicit 
formulas for the free fields. With zero current and charge densities, and hence 
A° = 0, the field equations (7.5.8) for A’ in Coulomb gauge are 

OF ge OP 


0 = — =A! —- —— =A’. TS 17 
Ox Ox; Ox ( ) 


The general real solution of Eqs. (7.5.11) and (7.5.17) is conveniently written 


d°q a 
A(x, t) = Jan fl apracko d)a(q, A)e't* ilq|t 
dX (27r)3/2,/2|q| 


+ eq aq. re textla] (7.5.18) 


where a(q, A) is an operator coefficient whose properties will be found from the 
canonical commutation relations, and e(q,A) are any two independent three- 
vectors normal to q, 

q-e(qg,4) =0 (7.5.19) 


with A a two-valued index distinguishing the two solutions of (7.5.19). By a 
suitable normalization of a(q, 4), we can always normalize these vectors so that 


Seg, eG. A) = 81 — Gig; /l4l - (7.5.20) 
Xr 


For instance, for q in the 3-direction, we can take e = (1,i,0)/ J/2 for’ = 1, 
ande = (1, —i, 0)//2 for 7 = —1, and, for q in a direction defined by 
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some choice of rotation from the 3-direction, apply the same rotation to e. 
These are the same as the polarization vectors for left- and right-handed circular 
polarization that appear in the Fourier expansion of an electromagnetic wave. 

With this normalization of the polarization vectors, the field (7.5.18) satisfies 
the canonical commutation relations (7.5.15)-(7.5.16) if we take 


[a(q, 4), a" (q/,4')] = dy 53(q—q’), (7.5.21) 
and 
[a(q, 4), a(q’,d’)] =0. (7.5.22) 


Then, just as we saw for a real scalar field in Section 7.2, the operator a’ (q, A) 
creates a photon of momentum q and polarization vector e(q, A) in any state 
vector on which it acts, while if there already is such a photon in the state, the 
operator a(q, A) removes it. 

To see the physical significance of A, note that for q in the 3-direction, if we 
perform a rotation by angle @ around the 3-axis, 


e€} > e,cosé+e2sin80 , e2 > —e;sind + e2cos6, 


then the polarization vectors change by phases as follows: 


e(q, +1) > eF’e(q, #1). 


Since there is nothing special about the 3-direction, this is the effect of rotation 
by angle @ around the direction of motion for a photon moving in any direction. 
In accordance with the general discussion of angular momentum in Section 5.4, 
this means that a photon created by a‘(q,A) has a component of angular 
momentum around the direction of motion, that is a helicity, equal to fA in cgs 
units. 

To calculate the free-field Hamiltonian, we first note that, since A° = 0 for 
free fields, the free-field Hamiltonian density is 


. Ia 3 1 
as co am eal ia a ee 


where as usual i and j run over the values 1, 2, 3, and repeated indices are 
summed. Using integration by parts and the Coulomb gauge condition (7.5.11) 
we find the free-field Hamiltonian 


3 1 3 A.A 
Ao = ax Ho = d°x [A; Aj + 0:4; 0; 4; | : 
a 
Inserting the field (7.5.18) and following just the same steps as in calculating 


the free-field Hamiltonian for a scalar field in Section 7.2, we find the free-field 
Hamiltonian for electromagnetism 
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1 
Hy=5)) , d°q\q\(a" (q, A)a(q, A) + a(q,d)a"(q,A)) 
Xr 


- yy / dalqia‘(q, A)a(q, A) + Evac ; (7.5.23) 
Xr 


where 
Evae = 8°(0) / dq\q\ = On)? / dB i dial. (7.5.24) 


As in the case of the real scalar field treated in Section 7.2, the vacuum ®ya-, 
defined as the state of lowest energy, must satisfy the condition 


a(q,A)Pvac = 0, (7525) 


since otherwise there would be a state a(q,)®yac with a lower energy than 
®yac. Thus 


Ho ®yac = EvacPvae - (7.5.26) 


The energy (7.5.24) is a contribution to the total vacuum energy that must 
be added to the contributions of all other fields, such as (7.2.16). The state 
consisting of a photon with momentum q; and helicity 4;, another photon with 
momentum qp and helicity A2, and so on, may be expressed as 


Dqi drsay,do;.. © 4" (Qi, Ara! (qo, Az) +++ Pyac 5 (7.5.27) 


and has energy Eyac + |qi] + |q2| +---. The term Eya- appears in the energy 
of all states, and so aside from gravitational phenomena may be ignored, as we 
shall do here. 


Radiative Decay 


We now consider the rate at which an excited atom!® will drop into a state 
of lower energy, emitting a photon. We shall neglect relativistic effects and 
the interaction of the electromagnetic field with the electron spin, so that the 
Hamiltonian for the atom interacting with the electromagnetic field is given by 
a sum over the particles in the atom of terms of form (5.8.3). Since we are 
interested in the emission only of a single photon, the relevant interaction term 
is the part of this sum linear in A: 


2Mn 


V=->-"1A,)-Ph + Pr AQ), 


where e, and m, are the charge and mass of the nth particle (electron or nu- 
cleus) while X, and P,, are the position and momentum operators of the nth 


16 The calculations here of radiative decay rates apply to molecules as well as to atoms, but to avoid repeating 


“or molecules” again and again, I will just refer below to transitions in atoms. 
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particle and A(X) is the quantum vector potential in the Schrédinger picture. 
Because we are using Coulomb gauge, in which A satisfies Eq. (7.5.11), it 
makes no difference in what order we write the operators in V, and we can 
just as well write 


en 
v=-)> mh &n) -P,. (7.5.28) 


n 


We take the initial and final states of the atom to be eigenstates ®j,p, and ® fp; 
of the Hamiltonian of the atom, with energies E; and Ef, respectively, and 
with total momenta p; and p, respectively. (Because atomic nuclei are heavy 
the kinetic energies of the states of the whole atom are always much less than 
E; — Ef, and so will be neglected.) The atomic state vectors are assumed to be 
normalized so that 


(Bap, Pap) = ba'ad°(p' — p) - (7.5.29) 


Each of these states is a vacuum as far as photons are concerned, so, for any 
photon momentum q and helicity A, 


a(q,A) Pip, = 4(g,4) Ppp, =0. (7.5.30) 


The initial state of the radiative decay process is then ®; y,, and the final state is 
a'(q,A)® f.p,> With q and A the momentum and helicity of the emitted photon. 
To first order in V we can treat A in Eq. (7.5.28) as a free field, so to this order 
the S-matrix element (5.6.36) for the decay process is 


Sli(pi) > fr) + v@,A)] 
= —2nid(Ey +1q| — E;)(a'(q,) Ppp, .V Pip,) 
—2nid(Ep¢ +|q| — Ei) (Ppp, 4G, A)V Pip,) 
3 d>q' 
= 2nid(E¢ +|q| — Ei) V 40 [= 
rt Cn)?/2q'| 
e “oy 
—* (@%,,,a(¢, de" (a, 2’) -P,e 19 a! (a, 4} y,) 
i as fp, aq. A)e*(q', A’) - Pre a’ (q’,d’)®jp,) 
(7.5.31) 


Using the photon vacuum condition (7.5.30) and the commutation relation 
(7.5.21), we can replace the product a(q, A)a‘(q’, 4’) with 6°(q — q’)d,,’, and 
do the integral over q’ and the sum over 2’ by just setting q’ = q and 2’ = A, so 


2niV/4r5(E ¢ + \q| — Ei) 
(27r)3/?./2|q| 


e . 
xD (Pipy OG, A) Pre" Gip,) . 
n n 


(7.5.32) 


Sli(pi) > fpf) + 7@,A)] = 
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At this point we make a further approximation, known as the electric dipole 
approximation. The wavelength 27 /|q| of the emitted photon is typically at least 
hundreds or thousands of angstroms, while the mean separations of electrons 
from the center of mass of the atom are typically a few angstroms. It is therefore 
usually a good approximation (as long as selection rules to be discussed below 
do not require the result to vanish) to replace each particle position X,, in the 
exponent in Eq. (7.5.32) with the center-of-mass coordinate vector 


X= ym , M= dm (7.5.33) 


Now, using the commutators of the momentum and position operators, 


> Pn. exp(iq-X) | = qexp(iq-X), (7.5.34) 


so!7 


exp(iq:- X)®pp, = Pp pyiq- (7.5.35) 


Hence, replacing all X,, in the exponent in Eq. (7.5.32) with X, and letting 
the adjoint of this exponential act on the final state, we have 


—2niV/405(E ¢ + |q| — Ei) 
(21)3/2./2[q] 
en 
x » —(Pppita e*(q, ) . Px Di;) . 
(7.5.36) 


The operators P,, all commute with the total momentum, so we can write their 
matrix elements as 


(® ¢p/+qsPn®ip,) = 5° (yr — Pi + @) (Pn) fi (7.5.37) 


Slip) > fr) +y@.a)] = 


and so 
Sli(p:) > f (pp) + vq, 1 = —27i8 (Er — E; + 1q)6° (pp — pi +) 
x Mii(pi) > fpr) +yv@AI, (7.5.38) 


!7 This argument does not rule out the possible presence of a numerical factor multiplying the right-hand 
side of Eq. (7.5.35). Any such factor of proportionality would have to have absolute magnitude unity, 
because [exp(iq - X)]‘ exp(iq - X) = 1, and we define both ®yp, and Pf p;+q to be normalized in 
accordance with Eq. (7.5.29). Such a phase factor would depend on our arbitrary choice of the phase of 
the state ® yp, as a function of py and can be defined to be unity, but in any case it cannot affect the 
radiative transition rate, which is proportional to the absolute value squared of the matrix element for the 
transition. So this possible phase factor will be ignored here. 
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where 
f _ Jar en ak 
Mii) > f@N +O) = Gon sag dX me nti eG). 


(7.5.39) 


To see how this is calculated in wave mechanics, note for example that in 
hydrogen the initial and final atomic wave functions take the form 


Wi.p; (XX) = eee) , 


exp(i[py + q] - x) 
(27)3/2 
where x is the vector separation of the electron and proton and xX is the coor- 


dinate vector of the center of mass. With me < mp, the matrix element of the 
electron momentum operator P, is 


wr), 


Wf .pyptq(X, X) = 


which has the same form as Eq. (7.5.37), with 


Pai = i f ax WjOoVH CO. 


Using Eq. (7.5.39) in Eq. (5.6.45) (with the number Ny of particles in the 
initial state equal to one), the differential decay rate is 


dvi > f + 4,4) = 22 |Mli@pi) > fey +yv@I/° 
x 6(E¢ + |ql — E83 (pp +q— pi) d°q a? py 


2 
1 


~ 2nlq| 


en x 
> mest eG.) 


x 6(E¢ + |ql — ES (pp +q—pi)deqd py . 
(7.5.40) 


The momentum-conservation delta function just goes to fix the recoil momen- 
tum pf = p; — q, and the energy-conservation delta function fixes the photon 
energy |q| = E; — Ey. Writing d°q = Iq? diq\dQ,, we are left with the rate 
for emission of a photon with helicity 4 into a small solid angle dQ,: 


2 
; lq| en 
dri > f+q= oo YP n)fieG%] dQ. TSAI) 
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In the common case where the photon helicity is not measured, the observed 
rate is given by a sum over helicities. This sum can be calculated using 
Eq. (7.5.20): 


S ej (q. Adee (GA) = Sik — 474k » 
r 
where g is the unit vector q/|q|. The observed differential decay rate for emis- 
sion of a photon with momentum q into a small solid angle dQ, is 


\ dG > f +q,a) 
Xr 


* 
lq| en en —_ 
~ On 2 m, Dei ps mn (Pat) fi ) jk = 4G ddQy « (7.5.42) 


We can now easily integrate over the photon direction, using 


~ A 1 
fax, [5 jx — G4 | — Am 8 jk [ _ | = = Ok . 


The total decay rate for emission of a photon in any direction with any helicity 
is then, in Einstein’s notation, 


2 
: 4 

Al = [ aa, Yara f+qay= 4 
Xr 


3 (7.5.43) 


é 
ey — (Pr) fi 
y in 


Section 3.5 shows how to use this also to calculate the rates of absorption and 
stimulated emission of radiation. 

Calculations are made easier if we replace matrix elements of momen- 
tum vectors with matrix elements of position vectors. For this, we use the 
commutator 


1 i 
X.H|= fs. rs rs = —-P,, 
n! n 


: 
so 

(Dp prp+qs PaPip;) = —1(Ei — Ep)mn (Pf pptq> XnPi.p;) 
—i|q\in(Pf¢pytq> XnPip;) - 
Therefore the decay rate (7.5.43) may be written 


se en (Xn) fi 


2 
At 


1 


. Alg|3 
- — (7.5.44) 


where 


(® ¢.pp-+qs Xn ®i.p;) = 8° (py — pi + q)(Xn) fi - (7.5.45) 
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In cgs units, Eq. (7.5.44) takes the form 


_ 4lo/ 
At ~ 3h Teen > 


where w = |q|c/f is the photon circular frequency. This formula was guessed 
by Heisenberg!® in 1925 by setting the radiation power emitted in the transition 
i —> f that had been calculated in classical electrodynamics!® equal to hod! : 
He used this formula as a starting point in his matrix mechanics approach to 
quantum mechanics. The quantum-mechanical derivation was first given by 
Dirac”? in 1927. 


Selection Rules 


As we have already warned, the electric dipole approximation is not useful if 
selection rules give zero for the decay matrix element. We can derive the selec- 
tion rules from either Eq. (7.5.43) or Eq. (7.5.44). First, as shown in Section 5.2, 
the components of the operator X can be assembled into the spherical harmonics 
for £2 = 1, 


a +] 3 0 
+ ay (X1 £iX2) = |X|¥_ (X/|X]), 4 — X3 = |XI¥; (X/|X]) . 
A An 


According to the rules for addition of angular momenta set out in Section 5.4, 
if the initial atom at rest has total angular momentum quantum number j;, then 
the states X;®; », for pj = 0 can only have total angular momentum quantum 
number j¢ equal to j; + 1, jj, or j; — 1 and, furthermore, if j; = 0 then only 
jf = 11s possible; j = 0 is only possible if j; = 1. Hence radiative decay 
does not occur in the electric dipole approximation unless the initial and final 
atomic states satisfy the selection rule 


res l= ee (7.5.46) 


There is a further selection rule that follows from space inversion symmetry. 
AS we saw in Section 5.4, if we change the sign of each of the three Cartesian 
coordinates, any state vector ® is changed to II®, where the operator IT is 
unitary in the sense that II’ = 1, and, since making two space inversions in 
succession changes nothing, also II = 1. Physical states therefore can be cho- 
sen as eigenstates of II with eigenvalue, known as the parity of the state, equal 
to +1 or —1. The coordinate vector is obviously odd under space inversion, so 


18 W. Heisenberg, Z. Physik 33, 879 (1925); reprinted in English in Van der Waerden, Sources of Quantum 
Mechanics, listed in the bibliography. 
9 J. Larmor, Phil. Mag. S.5 44, 503 (1897). 
20 P A.M. Dirac, Proc. Roy. Soc. A 114, 710 (1927). 
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TTX, 11 = —X,. Hence if the initial and final atomic states have parity 7; and 
mz, the transition rate will vanish in the electric dipole approximation unless the 
initial and final parities satisfy the selection rule 


wfp= Ty. (7.5.47) 


For instance, in hydrogen the transition 2) — 1s (ignoring spin) has j; = 1, 
™ = —, jf = 0, wf = +, So it satisfies the selection rules (7.5.46) and (7.5.47) 
and is therefore predominantly an electric dipole transition. This is the Lyman 
alpha ultraviolet transition. On the other hand, in the electric dipole approxima- 
tion the 2s, 3s, and 3d states are forbidden by both selection rules from decaying 
into the 1s ground state. 

Of course the electric dipole approximation is just an approximation. Instead 
of simply replacing the coordinates X,, in the exponent in Eq. (7.5.32) with the 
center-of-mass coordinate vector X, we can expand the exponential in powers of 
the small quantity q-[X,,—X]. With one factor of this quantity, the operator in the 
matrix element involves two factors of coordinates, which can be assembled into 
the spherical harmonics Y," and Y;”, which are respectively known as electric 
quadrupole and magnetic dipole terms. With two factors of coordinates, these 
operators are even under space inversion, so these contributions to the matrix 
element vanish unless the initial and final states satisfy the selection rules 


lg—Jfl S25 H4+I/ s =a electric quadrupole _— (7.5.48) 
lg=—sJAS lA ie ay ni = Wf magnetic dipole . (7.5.49) 


For instance, in hydrogen the transition 3d -—> Is occurs as an electric 
quadrupole transition. The rates of both electric quadrupole and magnetic dipole 
transitions are suppressed relative to electric dipole transitions by factors of 
order (gr/h)?, where r is a characteristic atomic radius; for optical transitions, 
this is of order 107’. 

We can go on, including higher and higher powers of X,, — X in the expansion 
of the exponential in Eq. (7.5.32), and also including effects of electron spin. 
But whatever effects we include, there is one kind of transition in which single- 
photon emission is completely forbidden: transitions between states that both 
have total angular momentum zero. This is a simple consequence of angular 
momentum conservation. As we have seen, a photon of helicity +1 has an 
angular momentum component in the direction of motion +1, and therefore 
cannot be emitted in a transition between states that have zero total angular 
momentum. For instance, none of the excited states of '7C or !°O with j = 0 
can emit a single photon in gamma decay to the 7 = O ground state. Such 
transitions require the emission of pairs of photons, or if enough energy is 
available, of electron—positron pairs. 
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Gauge Invariance and Charge Conservation 


It is essential both for the consistency of the Maxwell equations and for gauge 
invariance that the current four-vector J“ (x) defined by Eq. (7.5.7) should be 
conserved. In modern theories the matter with which electromagnetic fields 
interact is described by a field theory, so we need to ask, in what sort of field 
theory for matter is this current conserved? There is a simple answer. If charge 
is to be conserved, then the net charge destroyed by the product of fields in any 
term in the Lagrangian density must vanish, so if each field gy, destroys a charge 
é, and creates a charge —e,, then 


dL o£ In 
Dy en — Qn + me =0, (7.5.50) 
dn 9(Gn/Ix") Ax 


n 


with summation of course understood over the repeated spacetime index jz. This 
is the same as saying that the Lagrangian density is invariant under the phase 
transformation 


Gn > [L +i€en]Gn , (755) 


with € an arbitrary infinitesimal. Using the Euler-Lagrange equation (7.1.8) 
allows us to write Eq. (7.5.50) as a conservation equation 


oF 2), 7.5.52 
aa ( ) 
where 
OL mat 
7 7.5.53 
~ ena ag, jazi” ae 


(This is an example of the relation between symmetry principles and conserva- 
tion laws first expressed in the Noether theorem discussed in Section 5.7.) The 
factor —i is inserted here so that 


[ee | ye CnTnQn » (7.5.54) 
n 


and therefore 
/ Nl ae ae o.| = —€nYn » (7.5.55) 


which tells us that ¢,, is indeed the value of the charge { J 3x that is destroyed 
by the field gp. 

For the vector potential to interact with this conserved current, in the sense of 
Eq. (7.5.7), it is sufficient to arrange that dg, /dx" and A,, always occur in the 
matter Lagrangian density in the combination 
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) ; 
Du Gn = E ~ ienA| Qn (7.5.56) 
so that 
aL aL 
me =) ien = (7.5.57) 
dAu 0(0@n/Ox") 


From this, Eq. (7.5.7) follows immediately. 

For instance, we can use this prescription to include electromagnetic interac- 
tions in the Lagrangian density (7.4.5) for a complex scalar field g that destroys 
charge e: 


v| 99, ‘Tag, 
Limat = —n" Fema Ee - ieavo| : 


Also, we used this prescription in the previous section to include electromag- 
netic interactions in the Lagrangian density for Dirac fields. 

There is a more general possibility, that in addition to depending on the 
gn and Di,gn, the matter Lagrangian density may also depend on the gauge- 
invariant field-strength tensor F',). In this case, there is an additional term in 
the current defined by Eq. (7.5.7): 


OL wat 0 OL mat 
Jt =i n in P 7.5.58 
Loa agn [ax + A BF ae 


Because of the antisymmetry of F,,,, the new term in J“ is separately con- 
served. The possibility of this new term alerts us that the general principles 
of electrodynamics do not in themselves fully dictate the parameters in the 
Lagrangian that characterize the interaction of matter and radiation, including 
the magnetic moments of various particles. 


Local Phase and Matrix Transformations 


These prescriptions can be framed as consequences of an extended version of 
gauge invariance. We have already noted that the condition (7.5.50) of charge 
conservation is equivalent to the invariance of the Lagrangian density under the 
phase transformation (7.5.51). But if dg, /dx" only appears in the Lagrangian 
in the form (7.5.56), then the Lagrangian is invariant under a /ocal phase trans- 
formation, with € an arbitrary infinitesimal function of spacetime coordinates, 


Qn(x) > [1 + ie(x)enlon(x) , (7.5.59) 
provided that the vector potential at the same time undergoes the gauge 
transformation 
de(x) 


Au(x) > Ap(x) + ai 


(7.5.60) 
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Today this reasoning is often run in reverse. It is assumed that the Lagrangian 
density is invariant under the local phase transformation (7.5.59), with €(x) 
an arbitrary infinitesimal function of spacetime coordinates, and from this the 
existence is deduced of a vector field A,,(x) whose properties are governed by 
invariance under the gauge transformation (7.5.60). 

Indeed, our Standard Model of elementary particles and forces is based on 
an assumed invariance under a larger group of local transformations, not just 
by x-dependent phases, as in Eq. (7.5.59), but transformations by x-dependent 
matrices similar to those for isotopic spin rotations. From this, one deduces 
the existence of a number of photon-like particles: some, the gluons with zero 
mass, whose strong interactions prevent them from being observed in isolation, 
and others that are observed, the W~ and Vins that become massive as a result 
of a spontaneous breakdown of the local gauge symmetry. But these matters are 
beyond the scope of this book. 


Assorted Problems 


. Suppose that in a diatomic gas such as Hg, the vibrational degrees of free- 
dom are fully excited, along with the rotational and translational degrees of 
freedom. What is the ratio of the energy density of the gas to its pressure? 
What does this tell you about the speed of sound in the gas? 


. Suppose that Einstein in 1905 had assumed that, in the radiation at tem- 
perature T in a cubical enclosure, the number n of photons for each wave 
number and polarization is not any positive integer but can only be n = 0, 
n= 1, orn = 2. What would he have found for the energy density €(v, T) 
per unit frequency interval at frequency v and temperature T? 


. Suppose that in the 1910 experiment that revealed the existence of the 
nucleus of the atom, the nucleus had been moving toward the radon alpha 
ray source with speed vg. What could one conclude about the mass of the 
nucleus from the observation that alpha particles are sometimes scattered 
straight backwards from the atom? 


. Suppose that the potential energy of an electron in the field of a nucleus is 
not —Ze/r but rather V(r) = —gr~”, where g and 7 are positive-definite 
constants, but that Bohr’s quantization condition mevy,r, = nh is still valid, 
with fi some constant and n running over all positive-definite integers. 


e What would Bohr in 1913 have found for the radii r,,, velocities v,, and 
energies E,, ? 

e For what values of 7 do circular orbits exist that have E,, < 0? 

e What would be found for the relation between fi and h if one imposed 
Bohr’s correspondence principle on the orbits with n > 1? 


. How does the pressure in a non-relativistic ideal gas vary when the mass 
density varies adiabatically, assuming that the internal energy density is 
either 
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e much bigger than the pressure, or 
e equal to the pressure, or 
e one percent of the pressure? 


. It is summer in Texas. The temperature outside is 104 °F (40 °C). In order 


to keep the inside of your house at a comfortable 68 °F (20 °C) you need 
to take 10* joules per second of heat energy from inside to outside. For this 
purpose you use an inverse Carnot cycle. How much power will you need 
to run this? 


. In the electrolysis of water, how long does it take a 1 ampere current to 


produce | gram of oxygen gas? 


. Fifteen grams of element X combine with three grams of hydrogen to pro- 


duce 18 grams of a compound Y of element X and hydrogen, with nothing 
left over. Also, as gases all at the same temperature and pressure, 2 liters 
of element X combine with 3 liters of hydrogen (H2) to give 2 liters of 
compound Y. What is the chemical formula for compound Y, and what 
is the atomic weight of element X? (Take the atomic weight of hydrogen 
atoms as 1, and assume the validity of Avogadro’s principle.) 


. Consider a particle of mass m and velocity v, with |v| < c. Find the term in 


the energy of this particle of order mv*/c”, and the term in the momentum 
of order mv /c?. 


Suppose an observer who uses coordinates x” sees a uniform magnetic 
field of magnitude Bj, pointing in the 1-direction, and zero electric field. 
A second observer uses coordinates x/“= A”,x”, where A“, is the 
Lorentz transformation (4.2.6) that gives a body at rest a velocity with 
magnitude v in the 3-direction. What are the values of the components of 
the electric and magnetic fields seen by the second observer? 


In a spacetime with two space dimensions and one time dimension, the 
electromagnetic field consists of a two-component electric field E and a 
one-component magnetic field B. They satisfy differential equations 


OE 0B OE 0B 
igh=——'4¢-", ApS - 6 
ot 0x2 ot Ox] 
0E, dEo dE, OE, 10B 
4np = ——+—, 0= - : 
Ox] 0x2 Ox 0x2 c Ot 


Find what kind of transformation properties, under (2 + 1)-dimensional 
Lorentz transformations, we can give the field components F;, F2, and 
B and the densities J,, Jo, and p so that the above equations are Lorentz 
invariant, in the sense that they are invariant under linear transformations 
on spacetime intervals that leave (Ax!)? + (Ax?)? _ c?(At)? invariant. 
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Show that with the fields and densities transforming in this way, these 
equations really are Lorentz invariant. 


A particle known as a K meson, with mass 494 MeV/c’, decays at rest into 
a muon, with mass 106 MeV/c’, and a neutrino, with negligible mass. Use 
the conservation of energy and momentum to find the velocity of the muon. 


What second-order partial differential equation (second-order in both time 
and space derivatives) is satisfied by the de Broglie wave function for a free 
particle when we do not assume that its velocity is much less than c? 


When a beam of electrons of some definite energy is directed at a perfect 
crystal, it is found that the largest angle 6 between the incident and reflected 
waves at which reflection is enhanced by constructive interference is 150°. 
At what other value or values of @ is reflection enhanced by constructive 
interference? 


Suppose we measure the position of the electron in the lowest-energy state 
of a hydrogen atom. What is the probability of finding that the electron is 
farther than 10~° cm from the nucleus? 


Consider an electron in a d3/2 state with orbital angular momentum quan- 
tum number & = 2, total angular momentum quantum number j = 3/2, 
and total angular moment 3-component J3 = 4/2. Suppose we measure 
the 3-component 53 of the spin. What are the probabilities of getting the 
results $3 = A/2 and $3 = —h/2? (Calculate whatever Clebsch—Gordan 
coefficients you need — do not just look them up in a table.) 


Suppose the electron has spin 3/2 rather than 1/2, but that all other prop- 
erties of electrons and nuclei are as they are in the real world. What would 
you expect would be the atomic numbers Z of the two lightest halogen 
elements, that behave like fluorine and chlorine in our world? 


When a free electron is placed in a uniform magnetic field B pointing in the 
1-direction, the Hamiltonian becomes 


p- 


Me 


H= 


+ 1 |B) S; 


where S is the operator representing the electron spin vector and ju is a 
constant, related to the electron magnetic moment. Suppose that at tf = 0 
the expectation value of the spin vector has components 


(S1) = (S2) =0, (83) =A/2. 


What are the expectation values of the spin vector components at any later 
time? 


19. 
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Suppose that the interaction of the electron in a hydrogen atom with some 
sort of external field produces a term in the potential 


AV(r) = gr, 
where g is a small constant. Calculate the terms in the resulting shift in the 
energy of the 1s state that are of first and second order in g. 


Suppose that the spin-orbit coupling of the electron in hydrogen produces 
a term in the Hamiltonian 


AH =éL-S 


where & is a constant, and L and S are the orbital angular momentum and 
spin angular momentum of the electron. What does this term contribute to 
the fine-structure splitting between the 21/2 and 2:/p3/2 states of hydrogen? 


Consider the scattering of a spinless particle of mass m and momentum p 
by a central potential 


V(r) = Voexp (—r3/R?) 


where Vo and R are constants. Use the Born approximation to give a for- 
mula for the scattering amplitude in the limit pR < A. 


Consider a one-particle system with a Lagrangian 


p=" (*) ee 
el ae +(2). (X) , 


where V is some vector function of position X. 


e What equation of motion is satisfied by X? 

e Find the Hamiltonian of this theory. 

e What is the differential equation satisfied by the wave function w(x) of 
a state with a definite energy E? 


Consider a particle of charge e and mass m in classical electromagnetic 
potentials that depend on time as well as position, with Hamiltonian 


1 e 2 
H&P) = — [P — <A, | eo 1. 


Suppose you perform a time-dependent and position-dependent gauge 
transformation, to new potentials 
lo 
A" =A4+VE, aga, 
c ot 
where & is an arbitrary real function of position and time. What is the 
relation between the wave function w(x, t) that satisfies the time-dependent 
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Assorted Problems 


Schrédinger equation for the new potentials and the wave function y (x, f) 
that satisfies the time-dependent Schrédinger equation for the original 
potentials? 


Find the coordinate-space wave function of the one-particle state with an- 
gular momentum ¢ = OQ and energy Vo + 2fw in the harmonic oscillator 
potential (6.3.1). 


Suppose that the potential felt by an alpha particle for radius r outside the 
nuclear radius R is not the Coulomb potential, but instead V(r) = g/r?, 
where g is some positive constant. Calculate the exponential suppression 
factor in the rate of decay of an unstable alpha particle state with energy 
E <g/R’. 


Consider the theory of a neutral spinless particle A and a non-neutral spin- 
less particle B, with Lagrangian density 


1 aga dpa m4 9p 298 9 + 


£=— mu acu ox” 2 PAM axw gyv  ™BPBPB 


— SPAY ROB - 


Calculate the S-matrix elements for the processes A+ B — A + B and 
B+ B-» A+A to lowest order in g, where B is the antiparticle of B. 


Calculate the rate for emission of a photon in the transition 2p — Is in 
hydrogen. Derive formulas and use them to find numerical values. You can 
use the facts that the proton is much heavier than the electron, and that the 
wavelength of the photon emitted in this process is much larger than the 
atomic size, and you can neglect electron spin. 


What powers of the photon wave number appear in the rates for single- 
photon emission in the decays of the 4 state of hydrogen into the 2s and 
2p states? 
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